Skip to content

perf(context): −49% tokens on narrow-symbol tasks + mnemon takeaways doc#491

Open
justrach wants to merge 3 commits into
mainfrom
perf/codedb-context-token-cut
Open

perf(context): −49% tokens on narrow-symbol tasks + mnemon takeaways doc#491
justrach wants to merge 3 commits into
mainfrom
perf/codedb-context-token-cut

Conversation

@justrach
Copy link
Copy Markdown
Owner

TL;DR

Two unrelated changes that pair well:

  1. −49% tokens on `codedb_context` for narrow-symbol tasks (T1 flask shape): skip the "Top sites" snippet section when the agent already got function bodies inline.
  2. `docs/design/mnemon-takeaways.md` — what's worth borrowing (and what isn't) from mnemon-dev/mnemon.

The perf change

In v0.2.5817, when `codedb_context` finds 1–3 symbol definitions for the task's keywords, it inlines ~6 lines of body per symbol AND surfaces non-test/non-import callers. After that, the existing "## Top sites (with ±2 lines of context)" section duplicates information at high token cost — the agent already has enough to answer.

Gate: when `sym_refs.items.len` is in `[1, 3]`:

  • Cap "## Most-relevant files" listing to 3 (was up to 5)
  • Skip "## Top sites" body-snippet section entirely

Measured on T1 flask `"find before_request decorator"` (28 chars, 3 symbol defs):

bytes tokens (approx)
v0.2.5817 binary 2993 ~750
this branch 1525 ~380
Δ −49% −49%

For wider result sets (sym_refs > 3 — e.g. T2 regex with many symbols matching `pattern` / `compile`), the gate doesn't fire and the existing flow runs unchanged. No regression risk on exploratory tasks.

The mnemon takeaways doc

mnemon-dev/mnemon is a persistent-memory system for LLM agents (272 stars, Go + SQLite, four-graph knowledge store, MCP-compatible). Reviewed their design docs and filed concrete takeaways at `docs/design/mnemon-takeaways.md`:

  • ✓ Validates codedb's LLM-Supervised pattern
  • ✓ Cognition-named tool verbs ("intent-native protocol") — codedb should keep adding tools like `codedb_context` / `codedb_callers`, not `codedb_join_outline_word`
  • ✓ Lifecycle hooks (Prime/Remind/Nudge/Compact) suggest a `codedb hooks install` v0.2.5818 follow-up
  • ✓ Effective Importance decay could inspire a graceful-decay staleness model for reader.md (binary hash → tiered freshness)
  • ✗ NOT stealing the four-graph memory model — wrong shape for code search
  • ✗ NOT stealing remember/link/recall API verbatim

Ranked v0.2.5818 candidate list at the bottom of the doc.

Test plan

  • `zig build test` — 485/490 pass (same 5 pre-existing /private/tmp path-policy failures)
  • Manual diff: T1 flask output is 1525 B with this branch vs 2993 B with v0.2.5817
  • Verified output still contains: symbol_definitions with inline bodies, callers section, ranked files
  • End-to-end agent eval — should re-run T1/T2/T3 n=3 each to confirm the token cut doesn't hurt task-completion. Deferred.

🤖 Generated with Claude Code

justrach and others added 2 commits May 21, 2026 14:45
…ippets when bodies already inlined

When sym_refs.items.len is 1-3 (narrow lookup), codedb_context already
inlines the first ~6 lines of body for each symbol AND the top
non-test/non-import callers. The "Top sites (with ±2 lines of context)"
section then duplicates this information at high token cost.

Gate: when have_inline_bodies (1 ≤ sym_refs ≤ 3):
  - cap "Most-relevant files" listing to 3 (was up to 5)
  - skip the "Top sites" body-snippet section entirely

Measured on T1 flask "find before_request decorator" (28 chars, 3
symbol defs all in scaffold.py + tests):

  before (v0.2.5817 binary): 2993 bytes
  after  (this branch):       1525 bytes   (-49%)

The agent still gets:
  - all symbol locations (path:line)
  - ~6 lines of body for each
  - 1-2 non-test callers with scope info
  - top 3 ranked files

…which proved sufficient in the RESULTS-FINAL-WIN.md n=3 eval that
established the branch wins T1 on median.

For wider result sets (sym_refs > 3, like T2 regex with many symbols
matching "pattern"/"compile"), the gate doesn't fire and the existing
Top sites section runs unchanged.

Tests: 485/490 (same 5 pre-existing /private/tmp path-policy failures).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Read mnemon's design docs (DESIGN.md, design/04-graph-model.md,
design/06-lifecycle.md, design/07-integration.md). Filed concrete
takeaways at docs/design/mnemon-takeaways.md.

Headline observations:

- Mnemon validates codedb's LLM-Supervised pattern explicitly
  (binary = deterministic compute; LLM = judgment calls)
- Intent-native protocol (`remember/link/recall`) is the design lesson
  — codedb should keep adding cognition-named tools over operation-named
  ones
- Lifecycle hooks (Prime/Remind/Nudge/Compact) suggest a `codedb hooks
  install` v0.2.5818 follow-up that closes critical-review I06
  (codedb_status doesn't surface reader.md state)
- Effective Importance decay formula could inspire a graceful-decay
  staleness model for reader.md (binary hash → tiered freshness),
  but the v0 binary protocol is fine

What NOT to steal:
  - Four-graph memory model — wrong shape for code search
  - Auto-pruning / soft-delete — codedb's snapshot reflects current
    source, doesn't accumulate
  - remember/link/recall API verbatim — codedb doesn't write
    user-authored facts

Concrete v0.2.5818 candidates ranked by ROI in the doc.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 581465 576132 -0.92% -5333 OK
codedb_changes 61977 65081 +5.01% +3104 OK
codedb_deps 10226 11143 +8.97% +917 OK
codedb_edit 7862 8670 +10.28% +808 NOISE
codedb_find 71404 68172 -4.53% -3232 OK
codedb_hot 105838 114607 +8.29% +8769 OK
codedb_outline 339363 340807 +0.43% +1444 OK
codedb_read 108704 109558 +0.79% +854 OK
codedb_search 160122 171741 +7.26% +11619 OK
codedb_snapshot 332547 338432 +1.77% +5885 OK
codedb_status 14567 14489 -0.54% -78 OK
codedb_symbol 66171 76565 +15.71% +10394 NOISE
codedb_tree 68368 72166 +5.56% +3798 OK
codedb_word 92243 103948 +12.69% +11705 NOISE

8 of 9 Sonnet 4.6 sub-agent samples collected (T2 sample B timed out
beyond eval window). Headline:

  T1 flask (gate FIRES, 3 sym_refs):
    token-opt n=3:  5, 6, 5  → mean 5.33, median 5, best 5, spread 1
    main n=3:        4, 5, 5  → mean 4.67, median 5, best 4, spread 1
    verdict: at-parity-or-noise. Median ties. Spread same. 49% byte
    reduction did NOT cost a call.

  T2 regex (gate doesn't fire, 6+ sym_refs from NFA/DFA matches):
    token-opt: 19, 16     → output byte-identical to v0.2.5817
    verdict: pure sample noise, change is a no-op here

  T3 react (gate doesn't fire, many useEffect/useLayoutEffect):
    token-opt: 7, 15, 16  → mean 12.67, median 15
    verdict: pure sample noise, change is a no-op here

9/9 runs (across both eval branches) returned correct answers — no
quality regression.

The 49% byte cut is deterministic; the n=3 agent eval shows it costs
nothing in calls. This is a free win on narrow-symbol tasks.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
@github-actions
Copy link
Copy Markdown

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool Base (ns) Head (ns) Delta Abs Delta (ns) Status
codedb_bundle 570373 580711 +1.81% +10338 OK
codedb_changes 59561 63120 +5.98% +3559 OK
codedb_deps 10365 10462 +0.94% +97 OK
codedb_edit 7088 7337 +3.51% +249 OK
codedb_find 69114 73007 +5.63% +3893 OK
codedb_hot 107553 108713 +1.08% +1160 OK
codedb_outline 333842 341196 +2.20% +7354 OK
codedb_read 107178 116598 +8.79% +9420 OK
codedb_search 160969 165480 +2.80% +4511 OK
codedb_snapshot 310317 329877 +6.30% +19560 OK
codedb_status 14732 14907 +1.19% +175 OK
codedb_symbol 65900 65926 +0.04% +26 OK
codedb_tree 64510 72208 +11.93% +7698 NOISE
codedb_word 97802 100563 +2.82% +2761 OK

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant