Created: 2025-11-02
Feature: Archive Meeting Retrieval & Grounded Interpretation RAG
- Python 3.11+ (tested with Python 3.11, 3.12, and 3.13)
- pip package manager
- 4GB+ RAM available
- Meeting JSON files in required format (see MeetingRecord in data-model.md)
# Clone repository
git clone <repository-url>
cd Archive-RAG
# Create virtual environment (Python 3.11+ required)
python3 -m venv venv # or python3.11 -m venv venv if you have Python 3.11
source venv/bin/activate # On Windows: venv\Scripts\activate
# Install dependencies
pip install -r requirements.txt
# Download spaCy model (for entity extraction)
python -m spacy download en_core_web_sm
# Download embedding model (sentence-transformers will auto-download on first use)
# Or download manually: sentence-transformers all-MiniLM-L6-v2Create a directory with meeting JSON files in the following format:
{
"id": "meeting_001",
"date": "2024-03-15T10:00:00Z",
"participants": ["Alice", "Bob", "Charlie"],
"transcript": "Meeting transcript text here...",
"decisions": ["Decision 1", "Decision 2"],
"tags": ["budget", "planning"]
}Example directory structure:
data/
└── meetings/
├── meeting_001.json
├── meeting_002.json
└── meeting_003.json
Create a FAISS vector index from meeting JSON files:
# Basic indexing
archive-rag index data/meetings/ indexes/meetings.faiss
# Index with custom embedding model
archive-rag index --embedding-model sentence-transformers/all-mpnet-base-v2 data/meetings/ indexes/meetings.faiss
# Index with PII redaction (recommended)
archive-rag index --redact-pii data/meetings/ indexes/meetings.faissExpected Output:
- FAISS index file:
indexes/meetings.faiss - Index metadata:
indexes/meetings.faiss.metadata.json - Audit log:
audit_logs/index-{timestamp}.json
Query the indexed meetings:
# Basic query with text output
archive-rag query indexes/meetings.faiss "What decisions were made about budget allocation?"
# Query with JSON output (for programmatic use)
archive-rag query --output-format json indexes/meetings.faiss "What decisions were made?"Expected Output (text format):
Answer: Based on the meeting records, the following decisions were made about budget allocation...
Citations:
- [meeting_001 | 2024-03-15 | Alice]: "The budget committee decided to allocate $100k to the marketing department."
- [meeting_002 | 2024-04-20 | Bob]: "Additional funding of $50k was approved for Q2 projects."
Expected Output (JSON format):
{
"query_id": "uuid-123",
"query": "What decisions were made about budget allocation?",
"answer": "Based on the meeting records...",
"citations": [
{
"meeting_id": "meeting_001",
"date": "2024-03-15",
"speaker": "Alice",
"excerpt": "The budget committee decided..."
}
],
"evidence_found": true,
"model_version": "model-name-v1.0",
"audit_log_path": "audit_logs/query-uuid-123.json"
}View audit logs for compliance and transparency:
# List all audit logs
archive-rag audit-view
# View specific query log
archive-rag audit-view audit_logs/query-uuid-123.json
# Filter by user ID
archive-rag audit-view --user-id user@example.com
# Export logs from date range
archive-rag audit-view --date-from 2024-01-01 --date-to 2024-12-31 --export logs_2024.jsonDiscover high-level topics in the meeting archive:
# Run topic modeling with default settings
archive-rag topic-model indexes/meetings.faiss results/topics/
# Run with 20 topics using BERTopic
archive-rag topic-model --num-topics 20 --method bertopic indexes/meetings.faiss results/topics/Output: results/topics/topics.json with topic clusters and keywords.
Extract named entities from meetings:
# Extract all entities
archive-rag extract-entities indexes/meetings.faiss results/entities/
# Extract only organizations and persons
archive-rag extract-entities --entity-types ORG,PERSON indexes/meetings.faiss results/entities/Output: results/entities/entities.json with entity list and frequencies.
Run evaluation suite to measure factuality and citation compliance:
# Prepare benchmark file (data/benchmarks/eval.json)
# Format: Array of EvaluationCase objects (see data-model.md)
# Run evaluation
archive-rag evaluate indexes/meetings.faiss data/benchmarks/eval.json results/evaluation/Expected Output:
Evaluation Results:
- Total Cases: 100
- Citation Accuracy: 92% (≥90% required per SC-001)
- Factuality Score: 88%
- Hallucination Count: 0 (required per SC-002)
- Retrieval Latency: 1.5s avg (<2s required per SC-003)
For reproducible results, use fixed seeds:
# Index with fixed seed
archive-rag index --seed 42 data/meetings/ indexes/meetings.faiss
# Query with fixed seed
archive-rag query --seed 42 indexes/meetings.faiss "What decisions were made?"
# Topic modeling with fixed seed
archive-rag topic-model --seed 42 indexes/meetings.faiss results/topics/Note: Same input + data state + seed → identical output (constitution principle III).
Enable PII redaction during indexing:
archive-rag index --redact-pii data/meetings/ indexes/meetings.faissPII entities are redacted before indexing and entity extraction (FR-012).
Verify SHA-256 hashes of input files:
# Compute hashes only
archive-rag index --hash-only data/meetings/ -
# Verify hash during indexing
archive-rag index --verify-hash <expected-hash> data/meetings/ indexes/meetings.faissHash mismatches are logged as security events (FR-011).
-
Index not found
- Error:
Index file not found: indexes/meetings.faiss - Solution: Run
archive-rag indexfirst to create index
- Error:
-
No evidence found
- Response:
"No evidence found" - Reason: Query does not match any indexed content (FR-008)
- Solution: Try rephrasing query or check index contents
- Response:
-
Model loading failure
- Error:
Failed to load model: model-name - Solution: Verify model path and version, check dependencies
- Error:
-
Memory limit exceeded
- Error:
Memory limit exceeded (<4GB target) - Solution: Use smaller embedding model or reduce batch size
- Error:
# Show command help
archive-rag --help
archive-rag query --help
# Show version
archive-rag --versionAfter initial setup and usage:
Archive-RAG/
├── src/ # Source code
├── tests/ # Test suite
├── data/ # Meeting JSON files
│ ├── meetings/ # Input meeting JSON
│ └── benchmarks/ # Evaluation benchmarks
├── indexes/ # Generated FAISS indexes (git-ignored)
│ ├── meetings.faiss
│ └── meetings.faiss.metadata.json
├── audit_logs/ # Audit logs (git-ignored)
│ ├── index-{timestamp}.json
│ └── query-{uuid}.json
└── results/ # Output results
├── topics/ # Topic modeling results
├── entities/ # Entity extraction results
└── evaluation/ # Evaluation results
- Review Constitution: Understand Archive-RAG principles (see
.specify/memory/constitution.md) - Run Tests: Execute test suite to verify setup
- Index Your Data: Prepare meeting JSON files and create index
- Query System: Start querying with example questions
- Review Audit Logs: Ensure auditability and transparency
After following this quickstart, verify:
- ✅ Index created successfully
- ✅ Queries return answers with citations in format
[meeting_id | date | speaker] - ✅ Audit logs created for each query
- ✅ "No evidence found" returned when no matches
- ✅ Citations traceable to retrieved chunks
- ✅ Topic modeling and entity extraction work correctly
- ✅ Evaluation suite runs successfully
- ✅ Reproducible results with fixed seeds
For issues or questions:
- Review feature specification:
specs/001-archive-meeting-rag/spec.md - Review data model:
specs/001-archive-meeting-rag/data-model.md - Review CLI contracts:
specs/001-archive-meeting-rag/contracts/cli-commands.md - Check audit logs for debugging:
archive-rag audit-view