Skip to content

Admin access: unified privileged-surface auth + admin-ops control surface + Atlas hardening#97

Merged
jpr5 merged 9 commits into
mainfrom
blitz/pathfinder-admin-ops
Jun 8, 2026
Merged

Admin access: unified privileged-surface auth + admin-ops control surface + Atlas hardening#97
jpr5 merged 9 commits into
mainfrom
blitz/pathfinder-admin-ops

Conversation

@jpr5
Copy link
Copy Markdown
Contributor

@jpr5 jpr5 commented Jun 8, 2026

Summary

This PR adds a remotely-triggerable admin control surface to the Pathfinder docs-MCP server AND unifies authentication across all privileged surfaces under a single "admin access" credential.

  • Admin-ops control surfacePOST /admin/:op operation registry with reindex (scope: full / source / repo) and index-stats ops, plus a GET /admin/index-stats alias. Reindex validates the requested source/repo against the configured sources (returns 400 unknown_source / unknown_repo rather than silently accepting). Audit-logged; documented extension points for future ops.
  • Unified "admin access" auth — admin-ops, Atlas ratification, and analytics now all authenticate through the shared bearerTokenAuth against the existing ANALYTICS_TOKEN. The admin endpoint reuses this already-provisioned credential — no new secret is required. (Removed the separate PATHFINDER_ADMIN_TOKEN / dedicated admin auth middleware.)
  • Consistent auth status codes — a present-but-wrong bearer token now returns 401 unauthorized across all privileged surfaces (previously analytics/Atlas returned 403). Missing-token stays 401; no-token-configured stays fail-closed 503; analytics-read still 404 when analytics disabled.
  • Atlas hardening — replaced brittle error-message substring matching in the ratification 409 path with a typed AtlasSeedNotPendingError (stable code: "ATLAS_SEED_NOT_PENDING"), matched cross-module by instanceof + code fallback. Added route-level 409 coverage (approve/reject of a non-pending candidate).

Security notes

  • Fail-closed everywhere: when no token is configured the privileged surfaces return 503; constant-time token comparison with a length-guard (avoids the timing-safe-equal length crash); 500/503 bodies never leak err.message (no DB-URL/connstring exposure).
  • The admin endpoint joins the same privilege tier as Atlas ratification, which already gates a mutating action behind this token — no surface receives a weaker credential.
  • Accepted tradeoff: the shared auth honors a dev-only localhost bypass (nodeEnv === "development" + localhost), so a local dev server accepts admin calls without a token. Production is unaffected (the bypass fails closed behind trust_proxy).

Deployment

  • No new environment variable. ANALYTICS_TOKEN is already set on the production copilotkit-docs service and now also gates admin-ops. Takes effect on the next deploy.

Testing

  • Full suite green (4375 tests); typecheck + build clean.
  • New coverage: reindex unknown-source/repo 400s, dispatcher-500 no-leak, the 401 standardization across analytics/Atlas, and the Atlas 409 not-pending path.

Review

  • Underwent multiple unbiased 7-agent review rounds (security-focused): the initial admin-ops surface, then the auth-unification change. Final round returned zero must-fix findings; no access broaden/narrow regression across analytics/Atlas/admin.
  • Deferred to follow-ups (non-blocking): an Atlas-robustness pass (typed parse errors with row context, toDate validation, moving the approve-time reindex enqueue outside the DB write's try/catch) and minor admin-ops test-coverage/comment polish.

jpr5 added 9 commits June 7, 2026 21:32
…on registry

Introduce POST /admin/:op (plus a GET /admin/index-stats alias) on the existing
server, dispatched through an operation registry so future ops are a one-line
registration. Bearer auth reads PATHFINDER_ADMIN_TOKEN with a length-guarded
constant-time compare and fails closed: when the token is unset/empty the routes
return 503, so the feature can ship before the secret is provisioned. Seed the
registry with reindex (full/source/repo, mapping to the orchestrator's existing
queue methods, returning 202) and index-stats (reusing getIndexStats and the
same projection /health uses). Each invocation is audit-logged and reindex fires
a best-effort Slack notice; the registry documents how to add config-reload,
setting-set, reembed, atlas-cache-invalidate, and smoke as future ops.
Cover fail-closed 503 when PATHFINDER_ADMIN_TOKEN is unset or empty, 401 on a
missing or invalid token, 202 reindex dispatch for the full/source/repo scopes
against an injected orchestrator, 400 on malformed bodies, 404 on unknown ops,
and the index-stats response shape via an injected getIndexStats.
The reindex op accepted any non-empty source/repo string and returned 202,
then silently no-oped in the orchestrator drain when the name was a typo.
Now scoped reindexes are validated against getServerConfig().sources: an
unknown source returns 400 unknown_source and an unknown repo returns 400
unknown_repo, with the orchestrator only invoked on a match. A throwing
config read fails closed as 503 config_unavailable rather than 202.

Also widen the test-only orchestrator setter to cover all three queue
methods the reindex op calls, log index-stats failures through
formatErrorForLog to match the other admin-ops log sites, and drop the
dead .catch on the fire-and-forget Slack notify (it never rejects).
Add red-green coverage for the new validation: unknown source/repo return
400, and the happy-path tests now target a source/repo present in the
getServerConfig mock so they still reach 202. Make the index-stats failure
test deterministic by injecting a throwing getIndexStats and asserting the
exact 503 index_unavailable body with no DB detail leak, and add a dispatch
500 no-leak test driving a throwing reindex handler. Drop the `as never`
casts on the orchestrator fakes now that the setter type is widened, and
supply the missing queue methods on the atlas-ratification fake.
Replace the brittle substring-matched not-pending error in the atlas db
module with an exported AtlasSeedNotPendingError, and detect it by
instanceof (with a defensive code-string fallback) in the ratification
error handler. Add HTTP route tests asserting 409 when approving or
rejecting a missing/non-pending candidate.
…-surface auth

A wrong bearer token (length mismatch or byte mismatch) on the shared
analytics/Atlas/admin-ops auth now returns 401 unauthorized instead of
403 forbidden, per RFC 7235. Update the analytics auth tests to assert
the 401 unauthorized shape.
Replace the separate PATHFINDER_ADMIN_TOKEN auth with adminOpsAuth
delegating to the shared bearerTokenAuth (requireAnalyticsEnabled:
false), so the existing ANALYTICS_TOKEN gates analytics, Atlas
ratification, and admin ops with one credential. No new env var. Admin
ops now return 503 when no token is configured, 401 on a missing or
invalid token, and honor the dev-localhost bypass. Rewrite the admin-ops
tests against the shared auth model and drop the as-never casts in favor
of typed fixtures.
Describe the bearer token as the shared admin-access credential
(ANALYTICS_TOKEN) gating analytics, Atlas ratification, and admin ops,
and note the 401-on-wrong-token and dev-localhost-bypass behavior.
@jpr5 jpr5 changed the title Add an authenticated admin ops control surface (reindex + index-stats) Admin access: unified privileged-surface auth + admin-ops control surface + Atlas hardening Jun 8, 2026
@jpr5 jpr5 merged commit f8eb3a4 into main Jun 8, 2026
5 checks passed
@jpr5 jpr5 deleted the blitz/pathfinder-admin-ops branch June 8, 2026 16:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant