Created: 2025-11-02
Feature: Archive Meeting Retrieval & Grounded Interpretation RAG
Description: Represents an archived meeting log entry in JSON format.
Fields:
id(string, required): Unique meeting identifierdate(datetime, required): Meeting date and timeparticipants(list[string], required): List of participant names or IDstranscript(string, required): Full meeting transcript textdecisions(list[string], optional): List of decisions made in meetingtags(list[string], optional): Categorization tags for meeting
Validation Rules:
idmust be unique across all meeting recordsdatemust be valid ISO 8601 datetimetranscriptmust not be empty- JSON structure must be valid and parseable
- SHA-256 hash computed at ingestion time (FR-011)
- PII detection and redaction applied before indexing (FR-012)
Relationships:
- One-to-many: One MeetingRecord can have multiple chunks in EmbeddingIndex
- One-to-many: One MeetingRecord can be cited in multiple RAGQuery results
State Transitions:
- Ingested: JSON file read and validated
- Hashed: SHA-256 hash computed and stored
- PII-scanned: Personal information detected and redacted
- Indexed: Content chunked and embedded in FAISS index
- Available: Ready for retrieval in RAG queries
Description: Represents a user query and its processed response with citations.
Fields:
query_id(string, required): Unique query identifier (UUID)user_input(string, required): Original user questiontimestamp(datetime, required): Query execution timestampretrieved_chunks(list[dict], required): Retrieved document chunks with metadata- Each chunk:
{meeting_id, text, score, chunk_index}
- Each chunk:
output(string, required): Generated answer textcitations(list[dict], required): Verifiable citations in format[meeting_id | date | speaker]- Each citation:
{meeting_id, date, speaker, excerpt}
- Each citation:
model_version(string, required): Version of LLM used for generationembedding_version(string, required): Version of embedding model useduser_id(string, optional): SSO user identifier (from FR-013)evidence_found(boolean, required): Whether credible evidence was found (FR-008)audit_log_path(string, required): Path to immutable audit log entry (FR-005)
Validation Rules:
query_idmust be unique UUIDcitationsmust match format[meeting_id | date | speaker](constitution principle II)citationsmust have traceable source in retrieved_chunks (constitution principle I)- If
evidence_foundis false,outputmust be "No evidence found" (FR-008) - All citations must reference meeting_id present in retrieved_chunks
Relationships:
- Many-to-many: One RAGQuery cites multiple MeetingRecord chunks
- One-to-one: One RAGQuery has one audit log entry
State Transitions:
- Submitted: User query received
- Embedded: Query converted to embedding vector
- Retrieved: Similar chunks retrieved from FAISS index
- Generated: Answer generated from retrieved context
- Cited: Citations extracted and formatted
- Logged: Audit record created and persisted
- Complete: Response returned to user
Description: Represents the FAISS vector index containing embedded meeting document chunks.
Fields:
index_id(string, required): Unique index identifierversion_hash(string, required): SHA-256 hash of index configuration and model versionsembedding_model(string, required): Name and version of embedding model usedembedding_dimension(integer, required): Dimension of embedding vectorsindex_type(string, required): FAISS index type (e.g., IndexFlatIP, IndexIVFFlat)document_vectors(FAISS index, required): FAISS index containing document embeddingsmetadata(dict, required): Mapping from vector index to document metadata- Format:
{vector_index: {meeting_id, chunk_index, text, date, participants}}
- Format:
total_documents(integer, required): Total number of document chunks indexedcreated_at(datetime, required): Index creation timestampindex_path(string, required): File system path to FAISS index file
Validation Rules:
version_hashmust include embedding model version, FAISS index type, and configurationmetadatamust align with document_vectors (one entry per vector)- Index file must be deterministic for same input and seed (constitution principle III)
- Index must be reproducible given same meeting JSON and model versions
Relationships:
- One-to-many: One EmbeddingIndex contains multiple MeetingRecord chunks
- One-to-many: One EmbeddingIndex serves multiple RAGQuery retrievals
State Transitions:
- Initialized: Index structure created
- Embedded: Meeting documents chunked and embedded
- Built: FAISS index constructed with vectors
- Versioned: Version hash computed and stored
- Persisted: Index saved to disk
- Loaded: Index loaded from disk for retrieval
Description: Represents a benchmark test case for evaluating RAG system performance.
Fields:
case_id(string, required): Unique evaluation case identifierprompt(string, required): Test query promptground_truth(string, required): Expected answer contentexpected_citations(list[dict], required): Expected citations in format[meeting_id | date | speaker]- Each citation:
{meeting_id, date, speaker, excerpt}
- Each citation:
evaluation_metrics(dict, required): Scoring results- Keys:
citation_accuracy,factuality,hallucination_count,retrieval_precision
- Keys:
run_timestamp(datetime, required): Evaluation run timestampmodel_version(string, required): LLM version used for evaluationembedding_version(string, required): Embedding model version used
Validation Rules:
expected_citationsmust match format[meeting_id | date | speaker](constitution principle II)ground_truthmust be non-emptyevaluation_metricsmust include citation_accuracy (≥90% per SC-001)evaluation_metrics.hallucination_countmust be 0 (SC-002)
Relationships:
- Many-to-many: One EvaluationCase validates multiple MeetingRecord citations
- One-to-one: One EvaluationCase has one evaluation result set
State Transitions:
- Defined: Test case created with prompt and ground truth
- Executed: Query run against RAG system
- Scored: Metrics computed (citation accuracy, factuality, etc.)
- Validated: Results compared against ground truth
- Reported: Results included in evaluation report
- Meeting JSON Read → MeetingRecord created
- Validation → JSON structure and required fields validated
- Hashing → SHA-256 hash computed (FR-011)
- PII Detection → spaCy NER detects personal information (FR-012)
- PII Redaction → Personal information redacted before indexing
- Chunking → Transcript split into chunks (overlapping windows)
- Embedding → Chunks converted to embeddings using sentence-transformers
- Indexing → Embeddings added to FAISS index in EmbeddingIndex
- Metadata Storage → Chunk metadata stored in index metadata mapping
- Query Submission → RAGQuery created with user_input
- Query Embedding → User query converted to embedding vector
- FAISS Retrieval → Similar chunks retrieved from EmbeddingIndex
- Context Assembly → Retrieved chunks assembled with metadata
- LLM Generation → Answer generated from retrieved context only
- Citation Extraction → Citations formatted as
[meeting_id | date | speaker] - Evidence Check → If no credible evidence, output set to "No evidence found"
- Audit Logging → Immutable audit record created with full query details
- Response Return → RAGQuery completed with output and citations
- Query Execution → RAGQuery state transitions to Submitted
- Processing → Query processed through embedding, retrieval, generation
- Log Creation → Audit log entry created with:
- Query ID, user ID (from SSO), timestamp
- User input, retrieved sources, model versions
- Output, citations, evidence_found flag
- Persistence → Structured JSON log written to audit_logs/ directory
- Retention → Log retained for 3 years per FR-014
- ✅ Unique ID validation
- ✅ Date format validation (ISO 8601)
- ✅ Transcript non-empty validation
- ✅ JSON structure validation
- ✅ SHA-256 hash computation
- ✅ PII detection and redaction
- ✅ Citation format validation:
[meeting_id | date | speaker] - ✅ Citation traceability: citations must reference retrieved_chunks
- ✅ Evidence check: "No evidence found" when evidence_found is false
- ✅ Citation-source alignment: citations must match retrieved chunks
- ✅ Version hash includes model version and configuration
- ✅ Metadata alignment with document vectors
- ✅ Deterministic index construction with fixed seed
- ✅ Reproducibility validation
- ✅ Citation format validation
- ✅ Ground truth non-empty validation
- ✅ Citation accuracy ≥90% (SC-001)
- ✅ Hallucination count = 0 (SC-002)
- SHA-256 hash computed for each MeetingRecord at ingestion (FR-011)
- Hash stored in EmbeddingIndex metadata
- Hash verified on index access
- Hash mismatches logged as security events
- Fixed seeds for all random operations (embeddings, LLM inference, FAISS)
- Model versions pinned and tracked
- Index construction deterministic
- Same input + data state → identical output (constitution principle III)
- Immutable audit logs for every RAGQuery (constitution principle V)
- Full provenance: query, sources, model versions, output
- User ID from SSO included (FR-013)
- Logs retained for 3 years (FR-014)
- Structured JSON format enables parsing and analysis