Skip to content

Latest commit

 

History

History
697 lines (572 loc) · 15.2 KB

File metadata and controls

697 lines (572 loc) · 15.2 KB

Pale Fire API Guide

Overview

Pale Fire provides a RESTful API built with FastAPI for integrating knowledge graph search into your applications.

Quick Start

1. Install Dependencies

cd /path/to/palefire
pip install -r requirements.txt

2. Configure Environment

cp env.example .env
# Edit .env with your settings

3. Start the API Server

python api.py

Or with uvicorn directly:

uvicorn api:app --host 0.0.0.0 --port 8000 --reload

4. Access the API

  • API Base URL: http://localhost:8000
  • Interactive Docs: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

API Endpoints

GET / - Root

Get API status.

Response:

{
  "status": "ok",
  "message": "Pale Fire API is running"
}

GET /health - Health Check

Check API and database health.

Response:

{
  "status": "healthy",
  "message": "All systems operational",
  "database_stats": {
    "nodes": 1523,
    "relationships": 4567
  }
}

GET /config - Get Configuration

Get current configuration.

Response:

{
  "neo4j_uri": "bolt://localhost:7687",
  "llm_provider": "ollama",
  "llm_model": "deepseek-r1:7b",
  "embedder_model": "nomic-embed-text",
  "search_method": "question-aware",
  "search_limit": 20,
  "ner_enabled": true
}

POST /ingest - Ingest Episodes

Ingest episodes into the knowledge graph.

Request Body:

{
  "episodes": [
    {
      "content": "Kamala Harris is the Attorney General of California.",
      "type": "text",
      "description": "Biography"
    },
    {
      "content": {
        "name": "Gavin Newsom",
        "position": "Governor",
        "state": "California"
      },
      "type": "json",
      "description": "Structured data"
    }
  ],
  "enable_ner": true
}

Response:

{
  "status": "success",
  "message": "Successfully ingested 2 episodes"
}

POST /search - Search Knowledge Graph

Search the knowledge graph with intelligent ranking.

Request Body:

{
  "query": "Who was the California Attorney General in 2020?",
  "method": "question-aware",
  "limit": 5
}

Parameters:

  • query (required): Search query string
  • method (optional): Search method - standard, connection, or question-aware (default)
  • limit (optional): Maximum number of results (1-100, default: 5)

Response:

{
  "query": "Who was the California Attorney General in 2020?",
  "method": "question-aware",
  "total_results": 5,
  "timestamp": "2025-12-26T14:45:00.123456+00:00",
  "results": [
    {
      "rank": 1,
      "uuid": "abc123-def456-ghi789",
      "name": "Kamala Harris",
      "summary": "Attorney General of California from 2011 to 2017...",
      "labels": ["Person", "PoliticalFigure"],
      "attributes": {
        "position": "Attorney General",
        "state": "California"
      },
      "scoring": {
        "final_score": 0.9234,
        "original_score": 0.8456,
        "connection_score": 0.7823,
        "temporal_score": 1.0,
        "query_match_score": 0.9123,
        "entity_type_score": 2.0
      },
      "connections": {
        "count": 15,
        "entities": [
          {
            "name": "California",
            "type": "LOC",
            "labels": ["Entity", "LOC"],
            "uuid": "xyz789-abc123-def456"
          }
        ],
        "relationship_types": ["WORKED_AT", "LOCATED_IN"]
      },
      "recognized_entities": {
        "PER": ["Kamala Harris"],
        "LOC": ["California"],
        "ORG": ["Attorney General"]
      }
    }
  ]
}

POST /keywords - Extract Keywords

Extract important keywords from text using Gensim with configurable weights and methods.

Request Body:

{
  "text": "Artificial intelligence and machine learning are transforming technology. Deep learning algorithms process data efficiently.",
  "method": "tfidf",
  "num_keywords": 10,
  "min_word_length": 3,
  "max_word_length": 50,
  "use_stemming": false,
  "tfidf_weight": 1.0,
  "textrank_weight": 0.5,
  "word_freq_weight": 0.3,
  "position_weight": 0.2,
  "title_weight": 2.0,
  "first_sentence_weight": 1.5,
  "documents": null
}

Request Parameters:

  • text (required): Text to extract keywords from
  • method (optional): Extraction method (tfidf, textrank, word_freq, combined) - default: tfidf
  • num_keywords (optional): Number of keywords to extract (1-100) - default: 10
  • min_word_length (optional): Minimum word length (1-50) - default: 3
  • max_word_length (optional): Maximum word length (1-100) - default: 50
  • use_stemming (optional): Enable stemming - default: false
  • tfidf_weight (optional): Weight for TF-IDF scores (combined method) - default: 1.0
  • textrank_weight (optional): Weight for TextRank scores (combined method) - default: 0.5
  • word_freq_weight (optional): Weight for word frequency scores (combined method) - default: 0.3
  • position_weight (optional): Weight for position-based scoring - default: 0.2
  • title_weight (optional): Weight multiplier for words in titles/headers - default: 2.0
  • first_sentence_weight (optional): Weight multiplier for first sentence words - default: 1.5
  • documents (optional): Array of strings for document corpus (for IDF calculation in TF-IDF method)

Response:

{
  "method": "tfidf",
  "num_keywords": 10,
  "keywords": [
    {
      "keyword": "artificial",
      "score": 0.8234
    },
    {
      "keyword": "intelligence",
      "score": 0.7123
    },
    {
      "keyword": "machine",
      "score": 0.6543
    },
    {
      "keyword": "learning",
      "score": 0.6123
    },
    {
      "keyword": "deep",
      "score": 0.5891
    }
  ],
  "parameters": {
    "num_keywords": 10,
    "min_word_length": 3,
    "max_word_length": 50,
    "use_stemming": false,
    "tfidf_weight": 1.0,
    "textrank_weight": 0.5,
    "word_freq_weight": 0.3,
    "position_weight": 0.2,
    "title_weight": 2.0,
    "first_sentence_weight": 1.5
  },
  "timestamp": "2025-01-15T10:30:00.123456+00:00"
}

Example Requests:

# Basic keyword extraction
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Artificial intelligence and machine learning are transforming technology.",
    "num_keywords": 5
  }'

# Using TextRank method
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text here",
    "method": "textrank",
    "num_keywords": 10
  }'

# Combined method with custom weights
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text here",
    "method": "combined",
    "tfidf_weight": 1.5,
    "textrank_weight": 0.8,
    "title_weight": 3.0
  }'

# With document corpus for better IDF calculation
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Your text here",
    "method": "tfidf",
    "documents": [
      "Document 1 text...",
      "Document 2 text...",
      "Document 3 text..."
    ]
  }'

Error Responses:

{
  "detail": "Keyword extraction not available. Install gensim: pip install gensim>=4.3.0"
}

Status Code: 503 (Service Unavailable) - When gensim is not installed

Note: This endpoint requires gensim>=4.3.0 to be installed. Install with pip install gensim>=4.3.0. For better stemming support, also install NLTK: pip install nltk.

DELETE /clean - Clean Database

Clear all data from the Neo4j database.

⚠️ WARNING: This permanently deletes all data!

Response:

{
  "status": "success",
  "message": "Database cleaned successfully. Deleted 1523 nodes and 4567 relationships."
}

Usage Examples

Python

import requests

# Base URL
BASE_URL = "http://localhost:8000"

# Health check
response = requests.get(f"{BASE_URL}/health")
print(response.json())

# Ingest episodes
episodes_data = {
    "episodes": [
        {
            "content": "Kamala Harris is the Attorney General of California.",
            "type": "text",
            "description": "Biography"
        }
    ],
    "enable_ner": True
}
response = requests.post(f"{BASE_URL}/ingest", json=episodes_data)
print(response.json())

# Search
search_data = {
    "query": "Who is Kamala Harris?",
    "method": "question-aware",
    "limit": 5
}
response = requests.post(f"{BASE_URL}/search", json=search_data)
results = response.json()
print(f"Found {results['total_results']} results")
for result in results['results']:
    print(f"{result['rank']}. {result['name']} (score: {result['scoring']['final_score']:.4f})")

# Extract keywords
keywords_data = {
    "text": "Artificial intelligence and machine learning are transforming technology.",
    "method": "tfidf",
    "num_keywords": 10
}
response = requests.post(f"{BASE_URL}/keywords", json=keywords_data)
keywords = response.json()
print(f"Extracted {keywords['num_keywords']} keywords:")
for kw in keywords['keywords']:
    print(f"  - {kw['keyword']} (score: {kw['score']:.4f})")

JavaScript/TypeScript

const BASE_URL = 'http://localhost:8000';

// Health check
const health = await fetch(`${BASE_URL}/health`);
const healthData = await health.json();
console.log(healthData);

// Ingest episodes
const ingestResponse = await fetch(`${BASE_URL}/ingest`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    episodes: [
      {
        content: 'Kamala Harris is the Attorney General of California.',
        type: 'text',
        description: 'Biography'
      }
    ],
    enable_ner: true
  })
});
const ingestData = await ingestResponse.json();
console.log(ingestData);

// Search
const searchResponse = await fetch(`${BASE_URL}/search`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    query: 'Who is Kamala Harris?',
    method: 'question-aware',
    limit: 5
  })
});
const searchData = await searchResponse.json();
console.log(`Found ${searchData.total_results} results`);
searchData.results.forEach(result => {
  console.log(`${result.rank}. ${result.name} (score: ${result.scoring.final_score})`);
});

// Extract keywords
const keywordsResponse = await fetch(`${BASE_URL}/keywords`, {
  method: 'POST',
  headers: { 'Content-Type': 'application/json' },
  body: JSON.stringify({
    text: 'Artificial intelligence and machine learning are transforming technology.',
    method: 'tfidf',
    num_keywords: 10
  })
});
const keywordsData = await keywordsResponse.json();
console.log(`Extracted ${keywordsData.num_keywords} keywords:`);
keywordsData.keywords.forEach(kw => {
  console.log(`  - ${kw.keyword} (score: ${kw.score})`);
});

cURL

# Health check
curl http://localhost:8000/health

# Ingest episodes
curl -X POST http://localhost:8000/ingest \
  -H "Content-Type: application/json" \
  -d '{
    "episodes": [
      {
        "content": "Kamala Harris is the Attorney General of California.",
        "type": "text",
        "description": "Biography"
      }
    ],
    "enable_ner": true
  }'

# Search
curl -X POST http://localhost:8000/search \
  -H "Content-Type: application/json" \
  -d '{
    "query": "Who is Kamala Harris?",
    "method": "question-aware",
    "limit": 5
  }'

# Extract keywords
curl -X POST http://localhost:8000/keywords \
  -H "Content-Type: application/json" \
  -d '{
    "text": "Artificial intelligence and machine learning are transforming technology.",
    "method": "tfidf",
    "num_keywords": 10
  }'

# Clean database
curl -X DELETE http://localhost:8000/clean

Error Handling

All endpoints return standard HTTP status codes:

  • 200 - Success
  • 400 - Bad Request (invalid input)
  • 404 - Not Found
  • 500 - Internal Server Error
  • 503 - Service Unavailable (health check failed)

Error response format:

{
  "detail": "Error message describing what went wrong"
}

Authentication

Currently, the API does not implement authentication. For production use, consider adding:

  • API keys
  • OAuth 2.0
  • JWT tokens
  • Rate limiting

CORS Configuration

The API currently allows all origins (*). For production, configure appropriately:

app.add_middleware(
    CORSMiddleware,
    allow_origins=["https://yourdomain.com"],
    allow_credentials=True,
    allow_methods=["GET", "POST", "DELETE"],
    allow_headers=["*"],
)

Deployment

Production Server

# Install production server
pip install gunicorn

# Run with gunicorn
gunicorn api:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000

Docker

FROM python:3.11-slim

WORKDIR /app

COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt

COPY . .

CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]
# Build and run
docker build -t palefire-api .
docker run -p 8000:8000 --env-file .env palefire-api

Docker Compose

version: '3.8'

services:
  neo4j:
    image: neo4j:5.13
    ports:
      - "7474:7474"
      - "7687:7687"
    environment:
      NEO4J_AUTH: neo4j/password
    volumes:
      - neo4j_data:/data

  palefire-api:
    build: .
    ports:
      - "8000:8000"
    environment:
      NEO4J_URI: bolt://neo4j:7687
      NEO4J_USER: neo4j
      NEO4J_PASSWORD: password
    depends_on:
      - neo4j

volumes:
  neo4j_data:

Performance

Response Times

Endpoint Typical Response Time
/health 10-50ms
/config < 5ms
/ingest 100-500ms per episode
/search (standard) 100-300ms
/search (question-aware) 500-2000ms
/keywords 50-500ms (depends on text length and method)
/clean 100-5000ms (depends on data size)

Optimization Tips

  1. Use connection pooling - Configure Neo4j driver appropriately
  2. Enable caching - Cache frequent queries
  3. Batch ingestion - Ingest multiple episodes at once
  4. Async operations - API is fully async for better concurrency
  5. Load balancing - Run multiple instances behind a load balancer

Monitoring

Health Checks

Use the /health endpoint for monitoring:

# Kubernetes liveness probe
livenessProbe:
  httpGet:
    path: /health
    port: 8000
  initialDelaySeconds: 30
  periodSeconds: 10

Logging

The API logs to stdout. Configure log level via environment:

LOG_LEVEL=DEBUG python api.py

Metrics

Consider adding:

  • Prometheus metrics
  • Request/response times
  • Error rates
  • Database query performance

Troubleshooting

API Won't Start

Problem: Failed to initialize Pale Fire API

Solutions:

  1. Check Neo4j is running
  2. Verify .env configuration
  3. Check network connectivity
  4. Review logs for specific error

Search Returns No Results

Problem: Empty results array

Solutions:

  1. Check database has data: GET /health
  2. Try simpler query
  3. Verify data was ingested successfully
  4. Check search method compatibility

Slow Response Times

Problem: API is slow

Solutions:

  1. Check Neo4j performance
  2. Reduce search limit
  3. Use simpler search method
  4. Enable query caching
  5. Scale horizontally

See Also


Pale Fire API v1.0 - Knowledge Graph Search as a Service! 🚀