perf(context): −49% tokens on narrow-symbol tasks + mnemon takeaways doc by justrach · Pull Request #491 · justrach/codedb

justrach · 2026-05-21T06:46:56Z

TL;DR

Two unrelated changes that pair well:

−49% tokens on `codedb_context` for narrow-symbol tasks (T1 flask shape): skip the "Top sites" snippet section when the agent already got function bodies inline.
`docs/design/mnemon-takeaways.md` — what's worth borrowing (and what isn't) from mnemon-dev/mnemon.

The perf change

In v0.2.5817, when `codedb_context` finds 1–3 symbol definitions for the task's keywords, it inlines ~6 lines of body per symbol AND surfaces non-test/non-import callers. After that, the existing "## Top sites (with ±2 lines of context)" section duplicates information at high token cost — the agent already has enough to answer.

Gate: when `sym_refs.items.len` is in `[1, 3]`:

Cap "## Most-relevant files" listing to 3 (was up to 5)
Skip "## Top sites" body-snippet section entirely

Measured on T1 flask `"find before_request decorator"` (28 chars, 3 symbol defs):

	bytes	tokens (approx)
v0.2.5817 binary	2993	~750
this branch	1525	~380
Δ	−49%	−49%

For wider result sets (sym_refs > 3 — e.g. T2 regex with many symbols matching `pattern` / `compile`), the gate doesn't fire and the existing flow runs unchanged. No regression risk on exploratory tasks.

The mnemon takeaways doc

mnemon-dev/mnemon is a persistent-memory system for LLM agents (272 stars, Go + SQLite, four-graph knowledge store, MCP-compatible). Reviewed their design docs and filed concrete takeaways at `docs/design/mnemon-takeaways.md`:

✓ Validates codedb's LLM-Supervised pattern
✓ Cognition-named tool verbs ("intent-native protocol") — codedb should keep adding tools like `codedb_context` / `codedb_callers`, not `codedb_join_outline_word`
✓ Lifecycle hooks (Prime/Remind/Nudge/Compact) suggest a `codedb hooks install` v0.2.5818 follow-up
✓ Effective Importance decay could inspire a graceful-decay staleness model for reader.md (binary hash → tiered freshness)
✗ NOT stealing the four-graph memory model — wrong shape for code search
✗ NOT stealing remember/link/recall API verbatim

Ranked v0.2.5818 candidate list at the bottom of the doc.

Test plan

`zig build test` — 485/490 pass (same 5 pre-existing /private/tmp path-policy failures)
Manual diff: T1 flask output is 1525 B with this branch vs 2993 B with v0.2.5817
Verified output still contains: symbol_definitions with inline bodies, callers section, ranked files
End-to-end agent eval — should re-run T1/T2/T3 n=3 each to confirm the token cut doesn't hurt task-completion. Deferred.

🤖 Generated with Claude Code

…ippets when bodies already inlined When sym_refs.items.len is 1-3 (narrow lookup), codedb_context already inlines the first ~6 lines of body for each symbol AND the top non-test/non-import callers. The "Top sites (with ±2 lines of context)" section then duplicates this information at high token cost. Gate: when have_inline_bodies (1 ≤ sym_refs ≤ 3): - cap "Most-relevant files" listing to 3 (was up to 5) - skip the "Top sites" body-snippet section entirely Measured on T1 flask "find before_request decorator" (28 chars, 3 symbol defs all in scaffold.py + tests): before (v0.2.5817 binary): 2993 bytes after (this branch): 1525 bytes (-49%) The agent still gets: - all symbol locations (path:line) - ~6 lines of body for each - 1-2 non-test callers with scope info - top 3 ranked files …which proved sufficient in the RESULTS-FINAL-WIN.md n=3 eval that established the branch wins T1 on median. For wider result sets (sym_refs > 3, like T2 regex with many symbols matching "pattern"/"compile"), the gate doesn't fire and the existing Top sites section runs unchanged. Tests: 485/490 (same 5 pre-existing /private/tmp path-policy failures). Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

Read mnemon's design docs (DESIGN.md, design/04-graph-model.md, design/06-lifecycle.md, design/07-integration.md). Filed concrete takeaways at docs/design/mnemon-takeaways.md. Headline observations: - Mnemon validates codedb's LLM-Supervised pattern explicitly (binary = deterministic compute; LLM = judgment calls) - Intent-native protocol (`remember/link/recall`) is the design lesson — codedb should keep adding cognition-named tools over operation-named ones - Lifecycle hooks (Prime/Remind/Nudge/Compact) suggest a `codedb hooks install` v0.2.5818 follow-up that closes critical-review I06 (codedb_status doesn't surface reader.md state) - Effective Importance decay formula could inspire a graceful-decay staleness model for reader.md (binary hash → tiered freshness), but the v0 binary protocol is fine What NOT to steal: - Four-graph memory model — wrong shape for code search - Auto-pruning / soft-delete — codedb's snapshot reflects current source, doesn't accumulate - remember/link/recall API verbatim — codedb doesn't write user-authored facts Concrete v0.2.5818 candidates ranked by ROI in the doc. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-21T06:49:37Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	581465	576132	-0.92%	-5333	OK
`codedb_changes`	61977	65081	+5.01%	+3104	OK
`codedb_deps`	10226	11143	+8.97%	+917	OK
`codedb_edit`	7862	8670	+10.28%	+808	NOISE
`codedb_find`	71404	68172	-4.53%	-3232	OK
`codedb_hot`	105838	114607	+8.29%	+8769	OK
`codedb_outline`	339363	340807	+0.43%	+1444	OK
`codedb_read`	108704	109558	+0.79%	+854	OK
`codedb_search`	160122	171741	+7.26%	+11619	OK
`codedb_snapshot`	332547	338432	+1.77%	+5885	OK
`codedb_status`	14567	14489	-0.54%	-78	OK
`codedb_symbol`	66171	76565	+15.71%	+10394	NOISE
`codedb_tree`	68368	72166	+5.56%	+3798	OK
`codedb_word`	92243	103948	+12.69%	+11705	NOISE

8 of 9 Sonnet 4.6 sub-agent samples collected (T2 sample B timed out beyond eval window). Headline: T1 flask (gate FIRES, 3 sym_refs): token-opt n=3: 5, 6, 5 → mean 5.33, median 5, best 5, spread 1 main n=3: 4, 5, 5 → mean 4.67, median 5, best 4, spread 1 verdict: at-parity-or-noise. Median ties. Spread same. 49% byte reduction did NOT cost a call. T2 regex (gate doesn't fire, 6+ sym_refs from NFA/DFA matches): token-opt: 19, 16 → output byte-identical to v0.2.5817 verdict: pure sample noise, change is a no-op here T3 react (gate doesn't fire, many useEffect/useLayoutEffect): token-opt: 7, 15, 16 → mean 12.67, median 15 verdict: pure sample noise, change is a no-op here 9/9 runs (across both eval branches) returned correct answers — no quality regression. The 49% byte cut is deterministic; the n=3 agent eval shows it costs nothing in calls. This is a free win on narrow-symbol tasks. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>

github-actions · 2026-05-21T07:32:23Z

Benchmark Regression Report

Thresholds: 10.00% and 50,000 ns absolute delta

NOISE means the percentage threshold was exceeded, but the absolute delta was too small to fail CI.

Tool	Base (ns)	Head (ns)	Delta	Abs Delta (ns)	Status
`codedb_bundle`	570373	580711	+1.81%	+10338	OK
`codedb_changes`	59561	63120	+5.98%	+3559	OK
`codedb_deps`	10365	10462	+0.94%	+97	OK
`codedb_edit`	7088	7337	+3.51%	+249	OK
`codedb_find`	69114	73007	+5.63%	+3893	OK
`codedb_hot`	107553	108713	+1.08%	+1160	OK
`codedb_outline`	333842	341196	+2.20%	+7354	OK
`codedb_read`	107178	116598	+8.79%	+9420	OK
`codedb_search`	160969	165480	+2.80%	+4511	OK
`codedb_snapshot`	310317	329877	+6.30%	+19560	OK
`codedb_status`	14732	14907	+1.19%	+175	OK
`codedb_symbol`	65900	65926	+0.04%	+26	OK
`codedb_tree`	64510	72208	+11.93%	+7698	NOISE
`codedb_word`	97802	100563	+2.82%	+2761	OK

justrach and others added 2 commits May 21, 2026 14:45

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

perf(context): −49% tokens on narrow-symbol tasks + mnemon takeaways doc#491

perf(context): −49% tokens on narrow-symbol tasks + mnemon takeaways doc#491
justrach wants to merge 3 commits into
mainfrom
perf/codedb-context-token-cut

justrach commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

github-actions Bot commented May 21, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

justrach commented May 21, 2026

TL;DR

The perf change

The mnemon takeaways doc

Test plan

Uh oh!

github-actions Bot commented May 21, 2026

Benchmark Regression Report

Uh oh!

github-actions Bot commented May 21, 2026

Benchmark Regression Report

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant