Pale Fire API Guide

Overview

Pale Fire provides a RESTful API built with FastAPI for integrating knowledge graph search into your applications.

Quick Start

1. Install Dependencies

cd /path/to/palefire
pip install -r requirements.txt

2. Configure Environment

cp env.example .env
# Edit .env with your settings

3. Start the API Server

python api.py

Or with uvicorn directly:

uvicorn api:app --host 0.0.0.0 --port 8000 --reload

4. Access the API

API Base URL: http://localhost:8000
Interactive Docs: http://localhost:8000/docs
ReDoc: http://localhost:8000/redoc

API Endpoints

GET `/` - Root

Get API status.

Response:

{
  "status": "ok",
  "message": "Pale Fire API is running"
}

GET `/health` - Health Check

Check API and database health.

Response:

{
  "status": "healthy",
  "message": "All systems operational",
  "database_stats": {
    "nodes": 1523,
    "relationships": 4567
  }
}

GET `/config` - Get Configuration

Get current configuration.

Response:

{
  "neo4j_uri": "bolt://localhost:7687",
  "llm_provider": "ollama",
  "llm_model": "deepseek-r1:7b",
  "embedder_model": "nomic-embed-text",
  "search_method": "question-aware",
  "search_limit": 20,
  "ner_enabled": true
}

POST `/ingest` - Ingest Episodes

Ingest episodes into the knowledge graph.

Request Body:

{
  "episodes": [
    {
      "content": "Kamala Harris is the Attorney General of California.",
      "type": "text",
      "description": "Biography"
    },
    {
      "content": {
        "name": "Gavin Newsom",
        "position": "Governor",
        "state": "California"
      },
      "type": "json",
      "description": "Structured data"
    }
  ],
  "enable_ner": true
}

Response:

{
  "status": "success",
  "message": "Successfully ingested 2 episodes"
}

POST `/search` - Search Knowledge Graph

Search the knowledge graph with intelligent ranking.

Request Body:

{
  "query": "Who was the California Attorney General in 2020?",
  "method": "question-aware",
  "limit": 5
}

Parameters:

query (required): Search query string
method (optional): Search method - standard, connection, or question-aware (default)
limit (optional): Maximum number of results (1-100, default: 5)

Response:

{
  "query": "Who was the California Attorney General in 2020?",
  "method": "question-aware",
  "total_results": 5,
  "timestamp": "2025-12-26T14:45:00.123456+00:00",
  "results": [
    {
      "rank": 1,
      "uuid": "abc123-def456-ghi789",
      "name": "Kamala Harris",
      "summary": "Attorney General of California from 2011 to 2017...",
      "labels": ["Person", "PoliticalFigure"],
      "attributes": {
        "position": "Attorney General",
        "state": "California"
      },
      "scoring": {
        "final_score": 0.9234,
        "original_score": 0.8456,
        "connection_score": 0.7823,
        "temporal_score": 1.0,
        "query_match_score": 0.9123,
        "entity_type_score": 2.0
      },
      "connections": {
        "count": 15,
        "entities": [
          {
            "name": "California",
            "type": "LOC",
            "labels": ["Entity", "LOC"],
            "uuid": "xyz789-abc123-def456"
          }
        ],
        "relationship_types": ["WORKED_AT", "LOCATED_IN"]
      },
      "recognized_entities": {
        "PER": ["Kamala Harris"],
        "LOC": ["California"],
        "ORG": ["Attorney General"]
      }
    }
  ]
}

POST `/keywords` - Extract Keywords

Extract important keywords from text using Gensim with configurable weights and methods.

Request Body:

{
  "text": "Artificial intelligence and machine learning are transforming technology. Deep learning algorithms process data efficiently.",
  "method": "tfidf",
  "num_keywords": 10,
  "min_word_length": 3,
  "max_word_length": 50,
  "use_stemming": false,
  "tfidf_weight": 1.0,
  "textrank_weight": 0.5,
  "word_freq_weight": 0.3,
  "position_weight": 0.2,
  "title_weight": 2.0,
  "first_sentence_weight": 1.5,
  "documents": null
}

Request Parameters:

text (required): Text to extract keywords from
method (optional): Extraction method (tfidf, textrank, word_freq, combined) - default: tfidf
num_keywords (optional): Number of keywords to extract (1-100) - default: 10
min_word_length (optional): Minimum word length (1-50) - default: 3
max_word_length (optional): Maximum word length (1-100) - default: 50
use_stemming (optional): Enable stemming - default: false
tfidf_weight (optional): Weight for TF-IDF scores (combined method) - default: 1.0
textrank_weight (optional): Weight for TextRank scores (combined method) - default: 0.5
word_freq_weight (optional): Weight for word frequency scores (combined method) - default: 0.3
position_weight (optional): Weight for position-based scoring - default: 0.2
title_weight (optional): Weight multiplier for words in titles/headers - default: 2.0
first_sentence_weight (optional): Weight multiplier for first sentence words - default: 1.5
documents (optional): Array of strings for document corpus (for IDF calculation in TF-IDF method)

Response:

{
  "method": "tfidf",
  "num_keywords": 10,
  "keywords": [
    {
      "keyword": "artificial",
      "score": 0.8234
    },
    {
      "keyword": "intelligence",
      "score": 0.7123
    },
    {
      "keyword": "machine",
      "score": 0.6543
    },
    {
      "keyword": "learning",
      "score": 0.6123
    },
    {
      "keyword": "deep",
      "score": 0.5891
    }
  ],
  "parameters": {
    "num_keywords": 10,
    "min_word_length": 3,
    "max_word_length": 50,
    "use_stemming": false,
    "tfidf_weight": 1.0,
    "textrank_weight": 0.5,
    "word_freq_weight": 0.3,
    "position_weight": 0.2,
    "title_weight": 2.0,
    "first_sentence_weight": 1.5
  },
  "timestamp": "2025-01-15T10:30:00.123456+00:00"
}

Example Requests:

# Basic keyword extraction
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Artificial intelligence and machine learning are transforming technology.",
    "num_keywords": 5
  }'

# Using TextRank method
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text here",
    "method": "textrank",
    "num_keywords": 10
  }'

# Combined method with custom weights
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text here",
    "method": "combined",
    "tfidf_weight": 1.5,
    "textrank_weight": 0.8,
    "title_weight": 3.0
  }'

# With document corpus for better IDF calculation
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text here",
    "method": "tfidf",
    "documents": [
      "Document 1 text...",
      "Document 2 text...",
      "Document 3 text..."
    ]
  }'

Error Responses:

{
  "detail": "Keyword extraction not available. Install gensim: pip install gensim>=4.3.0"
}

Status Code: 503 (Service Unavailable) - When gensim is not installed

Note: This endpoint requires gensim>=4.3.0 to be installed. Install with pip install gensim>=4.3.0. For better stemming support, also install NLTK: pip install nltk.

DELETE `/clean` - Clean Database

Clear all data from the Neo4j database.

⚠️ WARNING: This permanently deletes all data!

Response:

{
  "status": "success",
  "message": "Database cleaned successfully. Deleted 1523 nodes and 4567 relationships."
}

Usage Examples

Python

import requests

# Base URL
BASE_URL = "http://localhost:8000"

# Health check
response = requests.get(f"{BASE_URL}/health")
print(response.json())

# Ingest episodes
episodes_data = {
    "episodes": [
        {
            "content": "Kamala Harris is the Attorney General of California.",
            "type": "text",
            "description": "Biography"
        }
    ],
    "enable_ner": True
}
response = requests.post(f"{BASE_URL}/ingest", json=episodes_data)
print(response.json())

# Search
search_data = {
    "query": "Who is Kamala Harris?",
    "method": "question-aware",
    "limit": 5
}
response = requests.post(f"{BASE_URL}/search", json=search_data)
results = response.json()
print(f"Found {results['total_results']} results")
for result in results['results']:
    print(f"{result['rank']}. {result['name']} (score: {result['scoring']['final_score']:.4f})")

# Extract keywords
keywords_data = {
    "text": "Artificial intelligence and machine learning are transforming technology.",
    "method": "tfidf",
    "num_keywords": 10
}
response = requests.post(f"{BASE_URL}/keywords", json=keywords_data)
keywords = response.json()
print(f"Extracted {keywords['num_keywords']} keywords:")
for kw in keywords['keywords']:
    print(f"  - {kw['keyword']} (score: {kw['score']:.4f})")

JavaScript/TypeScript

const BASE_URL = 'http://localhost:8000';

// Health check
const health = await fetch(`${BASE_URL}/health`);
const healthData = await health.json();
console.log(healthData);

// Ingest episodes
const ingestResponse = await fetch(`${BASE_URL}/ingest`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    episodes: [
      {
        content: 'Kamala Harris is the Attorney General of California.',
        type: 'text',
        description: 'Biography'
      }
    ],
    enable_ner: true
  })
});
const ingestData = await ingestResponse.json();
console.log(ingestData);

// Search
const searchResponse = await fetch(`${BASE_URL}/search`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'Who is Kamala Harris?',
    method: 'question-aware',
    limit: 5
  })
});
const searchData = await searchResponse.json();
console.log(`Found ${searchData.total_results} results`);
searchData.results.forEach(result => {
  console.log(`${result.rank}. ${result.name} (score: ${result.scoring.final_score})`);
});

// Extract keywords
const keywordsResponse = await fetch(`${BASE_URL}/keywords`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    text: 'Artificial intelligence and machine learning are transforming technology.',
    method: 'tfidf',
    num_keywords: 10
  })
});
const keywordsData = await keywordsResponse.json();
console.log(`Extracted ${keywordsData.num_keywords} keywords:`);
keywordsData.keywords.forEach(kw => {
  console.log(`  - ${kw.keyword} (score: ${kw.score})`);
});

cURL

# Health check
curl http://localhost:8000/health

# Ingest episodes
curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "episodes": [
      {
        "content": "Kamala Harris is the Attorney General of California.",
        "type": "text",
        "description": "Biography"
      }
    ],
    "enable_ner": true
  }'

# Search
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who is Kamala Harris?",
    "method": "question-aware",
    "limit": 5
  }'

# Extract keywords
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Artificial intelligence and machine learning are transforming technology.",
    "method": "tfidf",
    "num_keywords": 10
  }'

# Clean database
curl -X DELETE http://localhost:8000/clean

Error Handling

All endpoints return standard HTTP status codes:

200 - Success
400 - Bad Request (invalid input)
404 - Not Found
500 - Internal Server Error
503 - Service Unavailable (health check failed)

Error response format:

{
  "detail": "Error message describing what went wrong"
}

Authentication

Currently, the API does not implement authentication. For production use, consider adding:

API keys
OAuth 2.0
JWT tokens
Rate limiting

CORS Configuration

The API currently allows all origins (*). For production, configure appropriately:

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourdomain.com"],
    allow_credentials=True,
    allow_methods=["GET", "POST", "DELETE"],
    allow_headers=["*"],
)

Deployment

Production Server

# Install production server
pip install gunicorn

# Run with gunicorn
gunicorn api:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000

Docker

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]

# Build and run
docker build -t palefire-api .
docker run -p 8000:8000 --env-file .env palefire-api

Docker Compose

version: '3.8'

services:
  neo4j:
    image: neo4j:5.13
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: neo4j/password
    volumes:
      - neo4j_data:/data

  palefire-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      NEO4J_URI: bolt://neo4j:7687
      NEO4J_USER: neo4j
      NEO4J_PASSWORD: password
    depends_on:
      - neo4j

volumes:
  neo4j_data:

Performance

Response Times

Endpoint	Typical Response Time
`/health`	10-50ms
`/config`	< 5ms
`/ingest`	100-500ms per episode
`/search` (standard)	100-300ms
`/search` (question-aware)	500-2000ms
`/keywords`	50-500ms (depends on text length and method)
`/clean`	100-5000ms (depends on data size)

Optimization Tips

Use connection pooling - Configure Neo4j driver appropriately
Enable caching - Cache frequent queries
Batch ingestion - Ingest multiple episodes at once
Async operations - API is fully async for better concurrency
Load balancing - Run multiple instances behind a load balancer

Monitoring

Health Checks

Use the /health endpoint for monitoring:

# Kubernetes liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10

Logging

The API logs to stdout. Configure log level via environment:

LOG_LEVEL=DEBUG python api.py

Metrics

Consider adding:

Prometheus metrics
Request/response times
Error rates
Database query performance

Troubleshooting

API Won't Start

Problem: Failed to initialize Pale Fire API

Solutions:

Check Neo4j is running
Verify .env configuration
Check network connectivity
Review logs for specific error

Search Returns No Results

Problem: Empty results array

Solutions:

Check database has data: GET /health
Try simpler query
Verify data was ingested successfully
Check search method compatibility

Slow Response Times

Problem: API is slow

Solutions:

Check Neo4j performance
Reduce search limit
Use simpler search method
Enable query caching
Scale horizontally

FilesExpand file tree

API_GUIDE.md

Latest commit

History

API_GUIDE.md

File metadata and controls

Pale Fire API Guide

Overview

Quick Start

1. Install Dependencies

2. Configure Environment

3. Start the API Server

4. Access the API

API Endpoints

GET / - Root

GET /health - Health Check

GET /config - Get Configuration

POST /ingest - Ingest Episodes

POST /search - Search Knowledge Graph

POST /keywords - Extract Keywords

DELETE /clean - Clean Database

Usage Examples

Python

JavaScript/TypeScript

cURL

Error Handling

Authentication

CORS Configuration

Deployment

Production Server

Docker

Docker Compose

Performance

Response Times

Optimization Tips

Monitoring

Health Checks

Logging

Metrics

Troubleshooting

API Won't Start

Search Returns No Results

Slow Response Times

See Also

GET `/` - Root

GET `/health` - Health Check

GET `/config` - Get Configuration

POST `/ingest` - Ingest Episodes

POST `/search` - Search Knowledge Graph

POST `/keywords` - Extract Keywords

DELETE `/clean` - Clean Database