Pale Fire provides a RESTful API built with FastAPI for integrating knowledge graph search into your applications.
cd /path/to/palefire
pip install -r requirements.txtcp env.example .env
# Edit .env with your settingspython api.pyOr with uvicorn directly:
uvicorn api:app --host 0.0.0.0 --port 8000 --reload- API Base URL:
http://localhost:8000 - Interactive Docs:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Get API status.
Response:
{
"status": "ok",
"message": "Pale Fire API is running"
}Check API and database health.
Response:
{
"status": "healthy",
"message": "All systems operational",
"database_stats": {
"nodes": 1523,
"relationships": 4567
}
}Get current configuration.
Response:
{
"neo4j_uri": "bolt://localhost:7687",
"llm_provider": "ollama",
"llm_model": "deepseek-r1:7b",
"embedder_model": "nomic-embed-text",
"search_method": "question-aware",
"search_limit": 20,
"ner_enabled": true
}Ingest episodes into the knowledge graph.
Request Body:
{
"episodes": [
{
"content": "Kamala Harris is the Attorney General of California.",
"type": "text",
"description": "Biography"
},
{
"content": {
"name": "Gavin Newsom",
"position": "Governor",
"state": "California"
},
"type": "json",
"description": "Structured data"
}
],
"enable_ner": true
}Response:
{
"status": "success",
"message": "Successfully ingested 2 episodes"
}Search the knowledge graph with intelligent ranking.
Request Body:
{
"query": "Who was the California Attorney General in 2020?",
"method": "question-aware",
"limit": 5
}Parameters:
query(required): Search query stringmethod(optional): Search method -standard,connection, orquestion-aware(default)limit(optional): Maximum number of results (1-100, default: 5)
Response:
{
"query": "Who was the California Attorney General in 2020?",
"method": "question-aware",
"total_results": 5,
"timestamp": "2025-12-26T14:45:00.123456+00:00",
"results": [
{
"rank": 1,
"uuid": "abc123-def456-ghi789",
"name": "Kamala Harris",
"summary": "Attorney General of California from 2011 to 2017...",
"labels": ["Person", "PoliticalFigure"],
"attributes": {
"position": "Attorney General",
"state": "California"
},
"scoring": {
"final_score": 0.9234,
"original_score": 0.8456,
"connection_score": 0.7823,
"temporal_score": 1.0,
"query_match_score": 0.9123,
"entity_type_score": 2.0
},
"connections": {
"count": 15,
"entities": [
{
"name": "California",
"type": "LOC",
"labels": ["Entity", "LOC"],
"uuid": "xyz789-abc123-def456"
}
],
"relationship_types": ["WORKED_AT", "LOCATED_IN"]
},
"recognized_entities": {
"PER": ["Kamala Harris"],
"LOC": ["California"],
"ORG": ["Attorney General"]
}
}
]
}Extract important keywords from text using Gensim with configurable weights and methods.
Request Body:
{
"text": "Artificial intelligence and machine learning are transforming technology. Deep learning algorithms process data efficiently.",
"method": "tfidf",
"num_keywords": 10,
"min_word_length": 3,
"max_word_length": 50,
"use_stemming": false,
"tfidf_weight": 1.0,
"textrank_weight": 0.5,
"word_freq_weight": 0.3,
"position_weight": 0.2,
"title_weight": 2.0,
"first_sentence_weight": 1.5,
"documents": null
}Request Parameters:
text(required): Text to extract keywords frommethod(optional): Extraction method (tfidf,textrank,word_freq,combined) - default:tfidfnum_keywords(optional): Number of keywords to extract (1-100) - default: 10min_word_length(optional): Minimum word length (1-50) - default: 3max_word_length(optional): Maximum word length (1-100) - default: 50use_stemming(optional): Enable stemming - default: falsetfidf_weight(optional): Weight for TF-IDF scores (combined method) - default: 1.0textrank_weight(optional): Weight for TextRank scores (combined method) - default: 0.5word_freq_weight(optional): Weight for word frequency scores (combined method) - default: 0.3position_weight(optional): Weight for position-based scoring - default: 0.2title_weight(optional): Weight multiplier for words in titles/headers - default: 2.0first_sentence_weight(optional): Weight multiplier for first sentence words - default: 1.5documents(optional): Array of strings for document corpus (for IDF calculation in TF-IDF method)
Response:
{
"method": "tfidf",
"num_keywords": 10,
"keywords": [
{
"keyword": "artificial",
"score": 0.8234
},
{
"keyword": "intelligence",
"score": 0.7123
},
{
"keyword": "machine",
"score": 0.6543
},
{
"keyword": "learning",
"score": 0.6123
},
{
"keyword": "deep",
"score": 0.5891
}
],
"parameters": {
"num_keywords": 10,
"min_word_length": 3,
"max_word_length": 50,
"use_stemming": false,
"tfidf_weight": 1.0,
"textrank_weight": 0.5,
"word_freq_weight": 0.3,
"position_weight": 0.2,
"title_weight": 2.0,
"first_sentence_weight": 1.5
},
"timestamp": "2025-01-15T10:30:00.123456+00:00"
}Example Requests:
# Basic keyword extraction
curl -X POST http://localhost:8000/keywords \
-H "Content-Type: application/json" \
-d '{
"text": "Artificial intelligence and machine learning are transforming technology.",
"num_keywords": 5
}'
# Using TextRank method
curl -X POST http://localhost:8000/keywords \
-H "Content-Type: application/json" \
-d '{
"text": "Your text here",
"method": "textrank",
"num_keywords": 10
}'
# Combined method with custom weights
curl -X POST http://localhost:8000/keywords \
-H "Content-Type: application/json" \
-d '{
"text": "Your text here",
"method": "combined",
"tfidf_weight": 1.5,
"textrank_weight": 0.8,
"title_weight": 3.0
}'
# With document corpus for better IDF calculation
curl -X POST http://localhost:8000/keywords \
-H "Content-Type: application/json" \
-d '{
"text": "Your text here",
"method": "tfidf",
"documents": [
"Document 1 text...",
"Document 2 text...",
"Document 3 text..."
]
}'Error Responses:
{
"detail": "Keyword extraction not available. Install gensim: pip install gensim>=4.3.0"
}Status Code: 503 (Service Unavailable) - When gensim is not installed
Note: This endpoint requires gensim>=4.3.0 to be installed. Install with pip install gensim>=4.3.0. For better stemming support, also install NLTK: pip install nltk.
Clear all data from the Neo4j database.
Response:
{
"status": "success",
"message": "Database cleaned successfully. Deleted 1523 nodes and 4567 relationships."
}import requests
# Base URL
BASE_URL = "http://localhost:8000"
# Health check
response = requests.get(f"{BASE_URL}/health")
print(response.json())
# Ingest episodes
episodes_data = {
"episodes": [
{
"content": "Kamala Harris is the Attorney General of California.",
"type": "text",
"description": "Biography"
}
],
"enable_ner": True
}
response = requests.post(f"{BASE_URL}/ingest", json=episodes_data)
print(response.json())
# Search
search_data = {
"query": "Who is Kamala Harris?",
"method": "question-aware",
"limit": 5
}
response = requests.post(f"{BASE_URL}/search", json=search_data)
results = response.json()
print(f"Found {results['total_results']} results")
for result in results['results']:
print(f"{result['rank']}. {result['name']} (score: {result['scoring']['final_score']:.4f})")
# Extract keywords
keywords_data = {
"text": "Artificial intelligence and machine learning are transforming technology.",
"method": "tfidf",
"num_keywords": 10
}
response = requests.post(f"{BASE_URL}/keywords", json=keywords_data)
keywords = response.json()
print(f"Extracted {keywords['num_keywords']} keywords:")
for kw in keywords['keywords']:
print(f" - {kw['keyword']} (score: {kw['score']:.4f})")const BASE_URL = 'http://localhost:8000';
// Health check
const health = await fetch(`${BASE_URL}/health`);
const healthData = await health.json();
console.log(healthData);
// Ingest episodes
const ingestResponse = await fetch(`${BASE_URL}/ingest`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
episodes: [
{
content: 'Kamala Harris is the Attorney General of California.',
type: 'text',
description: 'Biography'
}
],
enable_ner: true
})
});
const ingestData = await ingestResponse.json();
console.log(ingestData);
// Search
const searchResponse = await fetch(`${BASE_URL}/search`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
query: 'Who is Kamala Harris?',
method: 'question-aware',
limit: 5
})
});
const searchData = await searchResponse.json();
console.log(`Found ${searchData.total_results} results`);
searchData.results.forEach(result => {
console.log(`${result.rank}. ${result.name} (score: ${result.scoring.final_score})`);
});
// Extract keywords
const keywordsResponse = await fetch(`${BASE_URL}/keywords`, {
method: 'POST',
headers: { 'Content-Type': 'application/json' },
body: JSON.stringify({
text: 'Artificial intelligence and machine learning are transforming technology.',
method: 'tfidf',
num_keywords: 10
})
});
const keywordsData = await keywordsResponse.json();
console.log(`Extracted ${keywordsData.num_keywords} keywords:`);
keywordsData.keywords.forEach(kw => {
console.log(` - ${kw.keyword} (score: ${kw.score})`);
});# Health check
curl http://localhost:8000/health
# Ingest episodes
curl -X POST http://localhost:8000/ingest \
-H "Content-Type: application/json" \
-d '{
"episodes": [
{
"content": "Kamala Harris is the Attorney General of California.",
"type": "text",
"description": "Biography"
}
],
"enable_ner": true
}'
# Search
curl -X POST http://localhost:8000/search \
-H "Content-Type: application/json" \
-d '{
"query": "Who is Kamala Harris?",
"method": "question-aware",
"limit": 5
}'
# Extract keywords
curl -X POST http://localhost:8000/keywords \
-H "Content-Type: application/json" \
-d '{
"text": "Artificial intelligence and machine learning are transforming technology.",
"method": "tfidf",
"num_keywords": 10
}'
# Clean database
curl -X DELETE http://localhost:8000/cleanAll endpoints return standard HTTP status codes:
200- Success400- Bad Request (invalid input)404- Not Found500- Internal Server Error503- Service Unavailable (health check failed)
Error response format:
{
"detail": "Error message describing what went wrong"
}Currently, the API does not implement authentication. For production use, consider adding:
- API keys
- OAuth 2.0
- JWT tokens
- Rate limiting
The API currently allows all origins (*). For production, configure appropriately:
app.add_middleware(
CORSMiddleware,
allow_origins=["https://yourdomain.com"],
allow_credentials=True,
allow_methods=["GET", "POST", "DELETE"],
allow_headers=["*"],
)# Install production server
pip install gunicorn
# Run with gunicorn
gunicorn api:app -w 4 -k uvicorn.workers.UvicornWorker -b 0.0.0.0:8000FROM python:3.11-slim
WORKDIR /app
COPY requirements.txt .
RUN pip install --no-cache-dir -r requirements.txt
COPY . .
CMD ["uvicorn", "api:app", "--host", "0.0.0.0", "--port", "8000"]# Build and run
docker build -t palefire-api .
docker run -p 8000:8000 --env-file .env palefire-apiversion: '3.8'
services:
neo4j:
image: neo4j:5.13
ports:
- "7474:7474"
- "7687:7687"
environment:
NEO4J_AUTH: neo4j/password
volumes:
- neo4j_data:/data
palefire-api:
build: .
ports:
- "8000:8000"
environment:
NEO4J_URI: bolt://neo4j:7687
NEO4J_USER: neo4j
NEO4J_PASSWORD: password
depends_on:
- neo4j
volumes:
neo4j_data:| Endpoint | Typical Response Time |
|---|---|
/health |
10-50ms |
/config |
< 5ms |
/ingest |
100-500ms per episode |
/search (standard) |
100-300ms |
/search (question-aware) |
500-2000ms |
/keywords |
50-500ms (depends on text length and method) |
/clean |
100-5000ms (depends on data size) |
- Use connection pooling - Configure Neo4j driver appropriately
- Enable caching - Cache frequent queries
- Batch ingestion - Ingest multiple episodes at once
- Async operations - API is fully async for better concurrency
- Load balancing - Run multiple instances behind a load balancer
Use the /health endpoint for monitoring:
# Kubernetes liveness probe
livenessProbe:
httpGet:
path: /health
port: 8000
initialDelaySeconds: 30
periodSeconds: 10The API logs to stdout. Configure log level via environment:
LOG_LEVEL=DEBUG python api.pyConsider adding:
- Prometheus metrics
- Request/response times
- Error rates
- Database query performance
Problem: Failed to initialize Pale Fire API
Solutions:
- Check Neo4j is running
- Verify
.envconfiguration - Check network connectivity
- Review logs for specific error
Problem: Empty results array
Solutions:
- Check database has data:
GET /health - Try simpler query
- Verify data was ingested successfully
- Check search method compatibility
Problem: API is slow
Solutions:
- Check Neo4j performance
- Reduce search limit
- Use simpler search method
- Enable query caching
- Scale horizontally
- CLI Guide - Command-line interface
- Configuration - Configuration options
- Quick Reference - Quick commands
Pale Fire API v1.0 - Knowledge Graph Search as a Service! 🚀