Skip to content

Phase 3: Discovery, catalog & trust (M1 + M2)#16

Merged
kosminus merged 3 commits into
mainfrom
feat/phase3-catalog-trust
Jun 8, 2026
Merged

Phase 3: Discovery, catalog & trust (M1 + M2)#16
kosminus merged 3 commits into
mainfrom
feat/phase3-catalog-trust

Conversation

@kosminus

@kosminus kosminus commented Jun 8, 2026

Copy link
Copy Markdown
Owner

Implements Phase 3 from planfull.md — the trust + discovery layer — scoped to M1 (certification & versioning) and M2 (catalog & lineage). Reuses the existing pgvector + keyword search (no tsvector) and uses sqlglot for lineage. Column profiling is deferred to a later milestone.

Milestone 1 — Certification & semantic versioning (migration 007)

  • status / version / certified_by_id / certified_at on metrics, glossary, sample queries (+ cert stamps on saved queries).
  • New SemanticVersion changelog table + versioning_service.py: one governed lifecycle (draft → in_review → certified → deprecated) for all four entity types. Editors submit-for-review/revert; admins certify/deprecate. Certifying validates SQL (read-only blocklist + sqlglot parse). Every content edit and transition appends a snapshot.
  • POST .../{entity}/{id}/status + GET .../versions[/{v}] on the metric/glossary/sample-query/saved-query routers; PUTs bump version + snapshot. Saved-query status changes are routed through the governed lifecycle.

Milestone 2 — Catalog & lineage (migration 008)

  • catalog_service.py: hybrid search across tables/columns/metrics/glossary/sample+saved queries/knowledge, reusing the existing pgvector embeddings + keyword scorer (no new full-text infra). Certified-first ranking; GET .../catalog/search + /facets.
  • lineage_service.py + ArtifactDependency: sqlglot parses saved-query/metric SQL into table/column edges on create/update (best-effort, lazy import → graceful no-op if sqlglot absent). Per-artifact "what it touches" (.../{entity}/{id}/lineage) and impact view "what depends on this table" (.../catalog/lineage?table=).

Frontend

  • Shared CertificationBadge / StatusActions / VersionHistory, wired into the Metrics, Glossary, and Saved Queries pages.
  • New CatalogPage (search + facet sidebar + lineage detail drawer); nav item + route.

Tests & docs

  • 22 new backend unit tests (177 total pass): versioning transitions/role-gating/cert-stamping, sqlglot extraction, catalog ranking.
  • CHANGELOG, README, CLAUDE.md, and planfull.md status table updated. sqlglot added as the optional [lineage] extra and to the backend image.

Verification (live against the sample stack)

  • Migrations 007/008 apply and reverse cleanly (downgrade 006upgrade head).
  • Lifecycle: in_review → certified stamps owner/time; invalid transition → 422; changelog records both; revert clears the cert stamp.
  • Catalog search ranks a certified metric ahead of higher-scored drafts.
  • Lineage: a 2-table join produced 6 edges (all resolved to schema-cache IDs) + correct impact view.
  • ruff clean on new files, new modules mypy-clean, npm run build + lint pass.

Notes

  • One fix found in verification: status/PUT endpoints hit MissingGreenlet on the onupdate timestamp after an UPDATE (Postgres uses RETURNING on INSERT but not here) — resolved by refreshing the entity in versioning_service, same pattern as dashboard_service._finalize.
  • Rebuild the backend image for lineage to populate in a running stack (sqlglot now in the Dockerfile).

Still open for Phase 3

  • Column profiling (deferred — natural M3).
  • Deep certify-time validation (missing columns / ambiguous joins), version diff-view UI, a SampleQueries page, and direct dashboard lineage edges.

🤖 Generated with Claude Code

cosmin chauciuc and others added 3 commits June 8, 2026 13:30
Milestone 1 — certification & semantic versioning (migration 007):
- status/version/certified_by/certified_at on metrics, glossary, sample
  queries (+ cert stamps on saved queries)
- SemanticVersion changelog + versioning_service: one governed lifecycle
  (draft -> in_review -> certified -> deprecated) for all four entity types;
  editors submit/revert, admins certify/deprecate; certify validates SQL.
  /status + /versions endpoints; PUTs bump version + snapshot.

Milestone 2 — catalog & lineage (migration 008):
- catalog_service: hybrid search (reuses pgvector + keyword scorer, no
  tsvector) across tables/columns/metrics/glossary/sample+saved
  queries/knowledge, certified-first ranking; /search + /facets
- lineage_service + ArtifactDependency: sqlglot parses saved-query/metric
  SQL into table/column edges on create/update (best-effort, lazy import);
  per-artifact "touches" + impact "depended on by" views

Frontend: shared CertificationBadge/StatusActions/VersionHistory wired into
Metrics/Glossary/SavedQueries; new CatalogPage with search, facets, lineage
drawer; nav + route.

Adds 22 unit tests (177 total). sqlglot added as optional [lineage] extra and
to the backend image. Column profiling deferred to a later milestone.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
The lineage tests assert the populated extraction path, which needs sqlglot
(the optional [lineage] extra). CI installed only [llm,dev,observability], so
extract_refs hit its ImportError no-op branch and the tests failed. Add lineage
to the CI install and guard the module with importorskip so it skips cleanly
where sqlglot isn't present.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
Reflect the CI fix in the docs: the backend install/dev commands and CI note
now include the [lineage] extra (sqlglot), with the importorskip caveat.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
@kosminus kosminus merged commit 5630782 into main Jun 8, 2026
2 checks passed
@kosminus kosminus deleted the feat/phase3-catalog-trust branch June 8, 2026 11:29
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant