fix(explore): skip-trigram files compete in Tier 1; share recall pipeline (#447, #451)#492
fix(explore): skip-trigram files compete in Tier 1; share recall pipeline (#447, #451)#492justrach wants to merge 2 commits into
Conversation
…sible to search watcher.zig:446 forces skip_trigram=true for files >64KB. Such files are NOT in self.trigram_index — they live in self.skip_trigram_files and are only reached via Tier 3 of searchContent, which runs AFTER Tier 1 (trigram candidates) fills the result quota. For a common identifier with a wide spread of incidental mentions in small files plus a canonical definition site in a large source file, Tier 1 saturates with small-file hits and Tier 3 never runs — so the canonical large file is completely invisible from search results. Real-world repro on this very repo: $ codedb search Explorer --max-results 27 ✓ 27 results — none from src/explore.zig (where `pub const Explorer = struct` lives at line 495 and there are 85 word-index hits for the term in that file) Fails on main; will pass once large-file Tier 1 priority is fixed. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
…line (#447, #451) `searchContent` Tier 0 used `max_results/5` as a per-file cap that starved the canonical large file when fewer than 5 hit-files existed. Tier 1 only walked trigram candidates, so files indexed with `skip_trigram=true` (content >64KB, watcher.zig:446) could only surface via Tier 3 — which never ran when small-file noise filled the quota first. `searchContentWithScope` duplicated the entire pipeline, so #447's fix landed in plain search while scope=true and `codedb_callers` stayed stale. Fix: - Promote skip-trigram files with word-index hits into Tier 1's ranked candidate set, sorted by (hit-count desc, file-len asc). - Cap Tier 0 per-file divisor at min(5, distinct-hit-files) so a single canonical file is no longer rationed to 1/5 of max_results. - Refactor `searchContent` into `searchContentLocked`; rewrite `searchContentWithScope` to call it and annotate scope, eliminating the divergent second pipeline. - Add issue-451 test covering scope=true recall of skip-trigram files. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: 06381d7914
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| const path_copy = try allocator.dupe(u8, r.path); | ||
| errdefer allocator.free(path_copy); | ||
| const line_text = try allocator.dupe(u8, r.line_text); | ||
| errdefer allocator.free(line_text); |
There was a problem hiding this comment.
Avoid duplicating scoped search payloads after shared search
searchContentWithScope now calls searchContentLocked (which already allocates path and line_text for each hit) and then immediately duplicates both fields again before freeing the originals. For large max_results (up to 10,000 via MCP) this doubles per-hit allocation/copy work in a hot query path (codedb_callers/scoped search), which is a measurable regression risk for latency and memory churn compared with transferring ownership of the existing slices.
Useful? React with 👍 / 👎.
Summary
searchContentWithScopeis rewritten as a thin annotator oversearchContent(via newsearchContentLocked), eliminating the divergent second pipeline that caused explore: searchContent invisibility for canonical definition sites in files >64KB #447's fix to skipcodedb_callersand scope=true search.min(5, distinct-hit-files)so a single canonical file is no longer rationed to 1/5 ofmax_results.Issues
Closes #451 (scope=true / callers invisibility).
Reaffirms #447 fix and prevents regression of the same bug class.
Test plan
zig build test— bothissue-447and the newissue-451test pass.issue-451test asserts the canonical large file is reachable viasearchContentWithScopeAND that the scope annotation (scope_name == "canonical") is correctly attached.codedb search Explorer --scopeagainst this repo and confirmsrc/explore.zigappears.codedb callers Exploreragainst this repo.🤖 Generated with Claude Code