Releases: DeusData/codebase-memory-mcp
v0.8.1
A focused follow-up to v0.8.0.
First-party HTTP server
The graph-UI web server has been reimplemented from scratch as a lean in-house module — a refactor that removes the last third-party server library from the binary. The new transport (src/ui/httpd.c) is purpose-built for what the UI actually needs:
- Localhost-only by construction — binds
127.0.0.1exclusively, with platform-correct socket options on every OS. - Strict HTTP/1.1 parsing — hard request caps (16 KB head / 1 MB body), strict CRLF handling, raw path matching, and a per-connection receive deadline.
- Simple by design — one request per connection (
Connection: close); no keep-alive state machine, no chunked encoding.
A new 28-test transport and routing suite covers the parsing edge cases, CORS policy, RPC dispatch, and shutdown behavior. All routes, status codes, and the localhost-only CORS policy behave exactly as before.
Also in this release
- Slimmer grammar set — the
nimgrammar (by far the largest vendored grammar at 66 MB) was dropped; 158 languages remain supported. - Attribution bundle in every archive — release archives now include
THIRD_PARTY_NOTICES.md, a consolidated file with the license texts of all vendored components. Homebrew and AUR installs place it alongside the binary. - Richer SBOM — per-component license and version metadata for everything compiled into the binary.
- Test suite: 5,604 tests (+27 vs v0.8.0).
Update
codebase-memory-mcp updateor grab the binaries below — every asset is signed, checksummed, and attested as usual.
Security Verification
All release binaries scanned with 70+ antivirus engines — 0 detections.
| Binary | SHA-256 | VirusTotal |
|---|---|---|
darwin-amd64 |
6cd48f218bf135a7b831... |
0/72 ✅ |
darwin-arm64 |
595cedd259200424f3d9... |
0/72 ✅ |
linux-amd64 |
58a0e1a968dacd324cde... |
0/72 ✅ |
linux-arm64 |
45d5c33d59d55b998b98... |
0/72 ✅ |
windows-amd64 |
12375e6a39a31f003d77... |
0/72 ✅ |
v0.8.0
📦 Archived release page — binaries for this version are no longer published. Get the latest release.
The headline: three new type-aware LSP engines. Java, Kotlin, and Rust now get full hybrid LSP resolution — type-aware call resolution through class hierarchies, annotation/attribute linkage, signature-aware overload matching, generics handling, and cross-file type inference — joining C/C++, Python, TypeScript/JavaScript, Go, C#, and PHP for 9 languages under hybrid LSP.
Architecture intelligence
- Leiden community detection:
get_architecturenow surfaces multi-level Leiden communities (replacing single-level Louvain) — real module boundaries on kernel-sized graphs, computed in seconds instead of minutes. - Queryable computation-bottleneck metrics: functions carry complexity, loop-depth, and scan-pattern metrics in the graph — you can now Cypher-query your codebase for its own algorithmic hotspots. (We used exactly this to find and fix several of our own, see below.)
- Helm support: chart templates,
includecall linkage, andChart.yamldependency edges; HCL block labels folded into node names; C/C++ preprocessor macros are now first-class Macro nodes (2.3M of them in the Linux kernel).
Stability — the biggest hardening push so far
The release fleet now indexes 16 large OSS repos end-to-end — the Linux kernel (4.8M nodes), elasticsearch, kubernetes, the TypeScript compiler, bitcoin, zig, symfony, and more — with database-level validity gates. That hunt, plus outstanding community reports, closed a long list of crashes:
- C++/templated-code crash family eliminated: uninitialized template-argument arrays, out-of-bounds parameter lookups, unbounded resolve recursion, expression-evaluation blowups — reported across #424, #427/#428/#432, #312, #322/#323, #336, #344, #355, #360, #385.
- Allocator unification (#424): the whole binary — vendored tree-sitter, SQLite, libgit2 — now runs under one allocator, ending a class of cross-allocator heap corruption on complex C++.
- Startup crash-loop fixed (#235, #439): a stack-buffer overflow in the project-list error path could SIGABRT the server for all projects; one corrupt cache DB no longer takes down every session.
- Database integrity, end to end: oversized index keys now spill to overflow pages correctly, all properties JSON is escaped and validated at the producer, the
url_pathindex always matches its generated column, and WAL files are checkpointed on close/startup (#387). Every database in the 16-repo fleet passesPRAGMA integrity_checkclean. - Server lifecycle: stdio servers exit with their parent via a death watchdog (#406), MCP
pingis supported (#354), JSON-RPC string ids are preserved (#253), log level is configurable viaCBM_LOG_LEVEL(#413). - Windows: wide-char API for non-ASCII paths (#386), cmd.exe-compatible git history pass (#324),
PATHEXTprobing (#221), packaging-walk hang fixed — the full test suite is green on Windows. - Plus: phantom gRPC routes (#294),
moderate-mode silently dropping subtrees (#411) with excluded directories now reported in indexing results, validator-safe project names (#349), UI flag diagnostics (#350).
Performance at kernel scale
- Memory rework: streaming SQLite dump writer, string interning, allocator page reclaim, and extraction back-pressure — the full Linux kernel indexes in ~4 minutes on a laptop, within a bounded memory budget; cgroup-aware CPU/memory detection and a
CBM_WORKERSoverride (#364/#365) make container behavior predictable. - Quadratic hot paths eliminated, several found by pointing the new bottleneck metrics at our own codebase: C/C++ cross-LSP overload resolution (#410), registry construction and lookup paths in the PHP/Python/Java/Kotlin resolvers, and architecture-boundary computation. Large PHP and Java monorepos see order-of-magnitude faster full indexes (symfony 35x, elasticsearch 9x vs. this cycle's early builds).
- Graph quality improved throughout these fixes — resolution coverage and edge fidelity are better in every supported language.
Query layer
- Cypher:
EXISTS {}predicates in WHERE,COUNT(DISTINCT)(#239), label alternation(n:A|B)(#242), label tests in WHERE (#241),WITH DISTINCT(#238), string/scalar/entity-introspection functions, multi-argument scalars — and unsupported syntax now fails loudly instead of returning silent empties (#373). search_code: invalid regex errors instead of empty results (#283),&in paths, timing in responses, wildcard-freefile_patternas substring (#200),.mjs/.cjs/.mts/.ctsindexed (#197), Blade templates mapped (#258), Rbox::use()/library()imports (#218/#219).
A much bigger safety net
- New per-grammar regression suite across all 159 grammars plus probe suites for extraction, edge creation, and LSP resolution — the suite now runs 5,500+ tests under a strict no-skip policy (skips are hard failures).
- Crash reproductions execute as subprocess-isolated tests with exit-signal assertions; ASan/LSan run across the suite; the macOS Intel build leg is now blocking so darwin-amd64 always ships.
Distribution & integrations
- Published to the official MCP Registry and the Glama directory; portable static Linux binary on all install/update paths.
install --planmachine-readable receipts before any config mutation (#388), Cursor IDE detection (#222), fish-native PATH handling (#319),$CLAUDE_CONFIG_DIRrespected (#321), SessionStart hooks for Codex, Gemini CLI, and Antigravity (#330), ADR storage unified between MCP tools and the UI (#256).
🙏 Code contributors
This release contains code from 16 community contributors — both directly merged PRs and PRs we distilled onto main (distilled PRs are credited to their authors; thank you for the patience with that workflow):
- @mattall — defensive type validation (PR #427) and out-of-line C++ method attribution (PR #428), plus the precise follow-up diagnosis in #432
- @jjserenity — Windows wide-char path support (#386) and WAL checkpointing (#387), with HC Cheng co-authoring
- @cmeerw — MCP ping support (#354), CLOCK_MONOTONIC timing (#342)
- @romanornr — pending-template-call guard (#322) and expression-evaluation cap (#323)
- @yangsec888 (Sam Li) — cgroup-aware CPU/memory detection +
CBM_WORKERSoverride (PR #365) - @nvt-pankajsharma — parent-death watchdog for the stdio server (PR #407)
- @santanusinha — environment-driven log level (PR #414)
- @casualjim — TS-LSP tuple-return arena fix (PR #374) and arena NULL guard
- @Smiie-2 — C++ template default-argument segfault fix (PR #360)
- @jjoos — growable node stacks in channel extraction (PR #339)
- @halindrome (Shane McCarron) —
$CLAUDE_CONFIG_DIRsupport (PR #321) - @hippus (Alexey Z) — Windows cmd.exe-compatible git history (PR #325)
- @code-by-mahereddy — security & correctness fixes from a codebase audit (PR #352)
- @cyprien-bussy — workspace import resolution + SvelteKit route extraction (PR #369)
- @utilia-ai-wox — workspace/pkgmap route fixes
🙏 Issue reporters
Precise, reproducible reports drove this release's stability work: @ZapAndersson (#424), @mattall (#427/#428/#432), @Charles5277 (#439), @awconstable (#235), @mattepiu (#410), @nvt-pankajsharma (#406), @GonEbal (#308/#408), @Patch76 (#411), @santanusinha (#413), @romanornr (#322/#323), @cmeerw (#312/#342/#354), @jjoos (#305/#340), @Codyzzz-zach (#344), @lilisir0722-crypto (#355), @EnziinSystem (#385), @Al2Klimov (#336), @hippus (#324), @yangsec888 (#364), @mydreamdoctor (#253), @zer09 (#373), @sponger94 (#294), @marcusyoung (#218/#219), @maksodf (#237-#242), @ehendrix23 (#252), @gaia (#258), @speeed76 (#200), @digijoebz (#213), @nishitpatel92 (#220), @Icaruk (#221), @xberry1231 (#222), @memniko (#227), @mandricmihai (#228), @tomerikjansen-crypto (#197), @cryptomaltese (#283), @SavageMessiah (#319), @caioribeiroclw-pixel (#388), @dergachoff (#330), @bschrib (#256), @Nick-CHI (#350), @edecarvalhoreis (#349), @Azami1990 (#367), @Carnival-z (#382).
If we missed anyone, tell us and we'll fix the notes — every report and PR moves this project forward.
Full changelog: v0.7.0...v0.8.0 — 206 commits
Security Verification
All release binaries scanned with 70+ antivirus engines — 0 detections.
| Binary | SHA-256 | VirusTotal |
|---|---|---|
darwin-amd64 |
d3e46236a32a9c9f7315... |
0/72 ✅ |
darwin-arm64 |
cf28ac4e0d2cca9a5a07... |
0/72 ✅ |
linux-amd64 |
ac4cae50ecad302a510f... |
0/72 ✅ |
linux-arm64 |
abde85eea0c0a540afd0... |
0/72 ✅ |
windows-amd64 |
d100a3ea2fd9fc05616a... |
0/72 ✅ |
v0.7.0
📦 Archived release page — binaries for this version are no longer published. Get the latest release.
v0.7.0 — Hybrid LSP across six languages, community contributions, full-platform validation
🧠 The headline: hybrid LSP
This release is the one where the call graph stops being a guess.
For every supported language we now ship a hybrid Light-Semantic-Pass LSP — a per-file, type-aware resolver that runs inside extraction. It tracks scopes, infers expression types, follows imports, walks inheritance, and rewrites the resolved callee on every CALLS edge. The plain tree-sitter pass gives you the structural graph; the LSP pass gives you the right callee. Six languages now benefit — four of them gained the LSP for the first time in this release.
New LSPs (introduced this cycle)
| Language | What the LSP does | Headline |
|---|---|---|
| Python | scope binding, expression typing, method dispatch, super(), decorators, multi-inheritance, TypedDict, match/case narrowing, walrus, comprehension element typing, generics & cross-file resolution | 100% on the bench; 95% on real-world instance-attribute resolution |
| PHP | Light Semantic Pass, generic templates, narrowing, self/static/$this, trait substitution, @phpstan-type aliases, Closure::bind, variance, 278 unit tests |
parity-grade across Laravel/Symfony/Doctrine/Guzzle/PSR idioms |
| TypeScript / JavaScript / JSX / TSX | full TS semantic surface — unions, intersections, function types, JSX element resolution, dts mode, callback-param contextual typing | the full TS dialect family, one resolver |
| C# / .NET | class/struct/record/interface/enum, primary constructors, indexers, accessors, LINQ chains, generics, switch narrowing | C# 12 modern features land on day one |
Sharpened LSPs (existed at v0.6.1; substantially upgraded here)
| Language | What was sharpened |
|---|---|
| C / C++ / CUDA | Tier 2 pre-built per-language cross-LSP registry; template return-type substitution; namespace + class body walkers made O(n) |
| Go | Tier 2 + Tier 3 metadata-driven cross-LSP (skips the AST re-walk entirely); per-file walkers O(n); pooled thread-local parser |
The Tier 1/2/3 architecture (this is the part that makes it tractable)
The LSP work is genuinely expensive — naively, every resolve worker would re-build a typing context per file. Instead it runs in three tiers:
- Tier 1 — per-file LSP inside
cbm_extract_file: scopes, expression typing, local resolution. - Tier 2 — pre-built per-language cross-LSP registry: a shared read-only base built once from every project def, then chained from per-file overlays via a fallback pointer. Resolve workers share it; no per-file re-walks of the entire registry. (Now wired for all six languages.)
- Tier 3 — metadata-driven cross-LSP (Go): consumes the
lsp_unresolvedmetadata the per-file pass emits and skips the cross-file AST re-walk entirely.
Made O(n) on wide files
Every per-file LSP walker — process_node, ast_sweep_shapes, apply_jsdoc_signatures, infer_implicit_returns, *_resolve_calls_in_node, plus the seven extract-side DFS walkers — used the for (i=0; i<count; i++) ts_node_child(node, i) idiom. That's O(i) per call in tree-sitter, so O(n²) over a wide root. Fixtures like TypeScript's reallyLargeFile.ts (583k lines, almost all comments) made this catastrophic.
A new cbm_lsp_collect_children O(n) cursor helper + a shared ts_nstack_push_children for the extract DFS now apply across all six LSPs and the extractors. Numbers from the dry-run validation:
| Repo | Files | LSP overrides | Before → After |
|---|---|---|---|
| microsoft/TypeScript | 40,689 | 7,432 | 5,100s → 50s (full); advanced 100s |
| dotnet/roslyn | 17,916 | 91,089 | crash → 46s |
| kubernetes (Go) | 20,650 | 80,818 | — → 51s |
| WordPress (PHP) | 3,622 | 7,303 | — → 7s |
| postgres (C) | 4,967 | 44 | — → 8s |
| torvalds/linux | 88,539 | 4,052 | — → 188s (full) / 207s (with LSP cross) |
lsp_overrides is "calls whose callee the LSP re-attributed to a more precise target than the textual resolver inferred." For richly-typed languages (C#, Go, TS, PHP) that's tens of thousands of corrections.
Two real bugs fixed along the way
process_nodeindexed the NULL-terminatedparam_types[]by the call's arg count; a call with more args than the function declares params read past the terminator → deref of a garbageCBMType*→ crash. Now bounded by aparam_countscan.return_type_ofbuilt tuple return types viacbm_type_tuple(NULL, …). Multi-return signatures now threadctx->arenathrough.
Always on
CBM_MODE_ADVANCED is removed. The LSP resolution runs in every index mode (full, moderate, fast) — at this point the walkers are O(n), the quality (~4% of calls re-attributed on TS, ~14% on Go) is worth the latency in every mode. "mode":"advanced" requests fall back to full (identical behaviour to the old advanced mode, so no breaking change). As a side benefit, the 104 test_incremental.c failures that were silently caused by incremental running with LSP gated off are now gone.
🌲 Pine Script & new platforms
- Pine Script language support via the
kvarenzn/tree-sitter-pinegrammar, integrated by @vinay-veerappa (#273). - NetBSD, FreeBSD, OpenBSD — full POSIX BSD trifecta — by Christof Meerwald (@cmeerw) (#313).
- Nix flake for reproducible builds by Joseph Voss (@josephvoss) (#265).
- Windows code search via PowerShell + Windows-side store integrity guards by @noctrex (#310, #311).
🔗 Graph & extraction
USAGEedges for decorator applications — Python/TS frameworks light up properly — by Matthew Prock (@map588) (#208).INHERITSedges fully emitted for Javaextends+implementsby Loay Chlih (@loaychlih) (#279).- Temporal properties on
FILE_CHANGES_WITHedges andFilenodes by Adam Schulte (#257). - C# 12 primary-constructor + field/property extraction, typed-stub factory constructors, and satellite-galaxy / cross-galaxy UI by @sponger94.
- Build-tool path-alias resolution for cross-file imports and mode-skipped file preservation during incremental indexing by Peter Cox (#243, #251).
- ES imports from embedded
<script>blocks by James (@jmcmacnz) (#224). - Growable arena-allocated traversal stacks (no more fixed-size truncation) by Ahmed Mohammed (#217).
⚡ Performance (beyond LSP)
search_graphregex cache + LIKE pre-filter + cheap count and a two-step FTS5 sub-query to kill multi-minutesearch_graph query=latency, by Austen Constable (@awconstable) (#300 and follow-ups).- Verstable v2.2.1 hash table (by Jackson Allan) vendored as the new
CBMHashTableengine — hot-path resolve speedup; 14 production callers (graph_buffer, registry, pipeline passes, watcher, semantic, …) get the new impl transparently. - Extract O(n²) eliminated across seven whole-tree DFS walkers (calls, channels, env-accesses, imports, type-assigns, variables) — the same
ts_node_child(node, i)idiom that bit the LSP. microsoft/TypeScript full-mode: 5,097s → 50s (102×). - Tree-sitter source-buffer padding in every parse pass — eliminates a latent benign over-read at parse boundaries.
🛡️ Security
HackerOne researcher submissions landed by their original authors:
search_codemulti-word regex patch by Jan Deelstra (#304).sqlite_writerindex-cell page guard by Jos Joosten (#303).- GitHub Actions shell injection in
_build.ymlfixed by Dustin Obrecht (@dLo999) (#249).
Plus:
search_graphdefault limit capped at 200 (was 500k — DoS hardening) by @amitmynd (#231).- Cypher / store buffer-overflow, OOM, and NULL-stmt crash hardening + thread-safety race elimination in log mutex, watcher, and indexer by Matthew Prock (@map588) (#206, #207).
wsbumped to 8.21.0 (GHSA-58qx-3vcg-4xpx / CVE-2026-45736).
💾 Reliability & store
CBM_SQLITE_MMAP_SIZEenv-controlled mmap + PASSIVE checkpoint to prevent file-shrink under concurrent readers by @edwardmhughes (#315 + follow-up).cbm_store_get_architecturewired into the MCP handler by Oliver Evans (@OliverEvans96) (#281).safe_free/safe_str_free/safe_buf_free/safe_growmemory helpers, with rollout to heavy sites, by Matthew Prock (@map588).trace_pathqualified-name fallback andlist_projectsno longer hidestmp--prefixed projects by Justin Wiley (NVIDIA)..mextension content-based disambiguation by @KuaaMU (#306).- AUR package docs by Chris Werner Rau (@cwrau) (#278).
✅ CI & build
The dry-run pipeline is fully green on every platform: lint (cppcheck + clang-format-20), security-static, CodeQL gate, test, build, smoke, quick soak — Windows + macOS arm64/intel + linux amd64/arm64.
🔍 Disclosures (release-readiness transparency)
Some changes in the CI-greening sweep were judgment calls, not pure fixes. Flagging them so reviewers can second-guess:
- LSP benchmarks (
pylsp_bench_resolution_ratio,cslsp_bench_resolution_ratio) — gated behindCBM_SKIP_PERFfor the dry run; in the release CI (skip_perf=false) they run with ASan-aware time budgets (×10 under sanitizers) and free the result before asserting so any future budget miss doesn't leak. The call-resolution-ratio quality is also asserted by the non-benchtest_py_lspandtest_cs_lspsuites that always run in CI. - Security audit allow-list extensions (each reviewed; documented inline):
- URL allow-list:
https://www.sqlite.org/c3ref/c_checkpoint_full.html— a doc reference i...
- URL allow-list:
v0.6.1
📦 Archived release page — binaries for this version are no longer published. Get the latest release.
v0.6.1 — 89 New Languages, Cross-Repo Intelligence, Team-Shared Graph Artifact, npm+PyPI Distribution
50+ commits since v0.6.0. Adds 89 tree-sitter grammars (66 → 155 languages), introduces cross-repo intelligence with CROSS_* edges, ships team-shared graph artifacts (.codebase-memory/graph.db.zst), introduces full distribution wrappers (npm/PyPI/Homebrew/Scoop/Winget/Chocolatey/AUR/Go) with npm + PyPI now auto-publishing as part of the release pipeline, and rolls out comprehensive installer security hardening.
Languages & Parsing — 66 → 155
- 89 new tree-sitter grammars vendored, with vocabulary-cleaned tokenization and grammar security audit script
- Lang spec coverage filled in for 114 languages with proper node types — Go (
func_literal), JS (do_statement, fixed stalecase_clause), C#/Python imports, shared arrays - 77 new extension mapping tests covering the new languages
- C#, Rust, Scala grammars updated to latest upstream
lang_specsrefactor: designated initializers + factory pointer
Cross-Repo Intelligence
- CROSS_ edge types* for cross-repo dependencies and architectural relationships
- gRPC / GraphQL / tRPC service detection with protobuf Route extraction
- gRPC stub detection in call resolution + chained call extraction
- Multi-galaxy UI layout + cross-repo architecture summary view
Team-Shared Graph Artifact
.codebase-memory/graph.db.zst— zstd-compressed knowledge graph that can be committed to the repo. Teammates bootstrap from the artifact instead of running a full reindex from scratch.- Vendored zstd 1.5.7 (amalgamated, ~52K LOC) for 8–13:1 compression
- Two-tier export:
zstd -9+ index stripping +VACUUM INTOfor explicit indexes (best ratio);zstd -3for watcher/incremental auto-updates (low-latency) - Import path: decompress → integrity check → auto-recreate indexes
- Auto-bootstrap in
index_repository: when no local DB exists but the artifact is present, import first then run incremental indexing - Auto-creates
.gitattributeswithmerge=oursto prevent merge conflicts on the binary artifact
Imports & Channels
- Generic package/module resolution for IMPORTS edges across 10 languages (resolves bare specifiers like
@myorg/pkg,github.com/foo/bar,use my_crate::foovia manifest scanning:package.json,go.mod,Cargo.toml,pyproject.toml,composer.json,pubspec.yaml,pom.xml,build.gradle,mix.exs,*.gemspec) - Channel detection expanded from JS/TS to 8 languages
Distribution
Now installable directly from public package registries:
npm install -g codebase-memory-mcp # npm
pip install codebase-memory-mcp # PyPI
go install github.com/DeusData/codebase-memory-mcp/pkg/go@latest # Go- npm + PyPI auto-publish integrated into the release pipeline (
publish-registriesjob afterverify, then atomicpublish-finalun-drafts the GitHub release only after both registries succeed — no half-shipped state) - npm package uses
--provenance(GitHub OIDC build attestations visible on npmjs.com) - Full distribution wrappers in
pkg/for: npm, PyPI, Homebrew, Scoop, Winget, Chocolatey, AUR, Go
Security Hardening
- PyPI installer: hardened against tar-slip and scheme-confusion attacks (PR #248 by @dLo999, closes #246)
- npm installer: checksum verification, HTTPS-only redirects, no shell injection
- Cross-installer hardening: removed
Unblock-File, added HTTPS-only URL validation - vite bumped to 6.4.2 — fixes CVE GHSA-4w7w-66w2-5vf9 and GHSA-p9ff-h696-f583
- Grammar security audit added to vendor pipeline
- README: VirusTotal scan links (binary hashes), SLSA badge, Security & Trust section, transparency disclaimer, responsible-disclosure invitation
- arXiv paper badge + citation
Stability & Quality
get_graph_schemanow exposes property definitions per node label- sqlite_writer overflow pages — fixes SIGBUS on large records (#139)
- RSS reclamation after
delete_project: explicitmem_collect+ immediate purge - MCP tools / CLI: improved error handling, diagnostics, and cancellation
- Cherry-picked extraction & Cypher improvements from PR #162
Editor / Agent Integration
- Kiro CLI support (#96)
Platform Fixes
- Windows:
pass_pkgmapnow usescbm_strndup(mingw clang lacks POSIXstrndup) test_watcher: usesGIT_AUTHOR_*/GIT_COMMITTER_*env vars instead of mutating global git config
CI / Smoke
- Smoke test JSON parsing fixed — CLI default mode unwraps the MCP envelope; smoke now parses the inner JSON directly
- Binary string audit allowlist —
telnetURI scheme from the rst grammar is documented as a known-benign match
Contributors
Thanks to everyone who contributed to this release:
Full changelog: v0.6.0...v0.6.1
Security Verification
All release binaries scanned with 70+ antivirus engines — 0 detections.
| Binary | SHA-256 | VirusTotal |
|---|---|---|
darwin-amd64 |
7836878876c8956f6413... |
0/72 ✅ |
darwin-arm64 |
3e72c8cb364c431d99f1... |
0/72 ✅ |
linux-amd64 |
7e6624b345f994afb901... |
0/72 ✅ |
linux-arm64 |
ac2498c45235c1bf37f8... |
0/72 ✅ |
windows-amd64 |
d773be23ed0823d58677... |
0/72 ✅ |
v0.6.0
📦 Archived release page — binaries for this version are no longer published. Get the latest release.
v0.6.0 — Semantic Search, SIMILAR_TO Edges & Cross-Language Intelligence
85+ commits since v0.5.7. Major release adding vector-based semantic search, structural near-clone detection, cross-language import resolution, and significant quality-of-life improvements across all platforms.
Semantic Search & Vector Embeddings
semantic_querytool: keyword-based vector search across the entire codebase graph viacbm_cosine_i8SQL function- Nomic nomic-embed-code embeddings: 40K pretrained token vectors (768d int8), distilled from nomic-ai/nomic-embed-code with simulated attention
- 11-signal combined scoring: TF-IDF, Reflective Random Indexing, API/Type/Decorator signatures, AST structural profiles, approximate data flow, Halstead-lite metrics, MinHash, module proximity, graph diffusion
SEMANTICALLY_RELATEDedges: connect functions with vocabulary mismatch but similar purpose (score >= 0.80, max 10 per node, same-language only)- Per-keyword min-cosine scoring replaces merged vector averaging for better precision
- Score clamping to [0,1] — proximity multiplier no longer pushes scores above 1.0
- Clone deduplication: SIMILAR_TO pairs with Jaccard >= threshold skip SEMANTICALLY_RELATED
SIMILAR_TO Edges (Near-Clone Detection)
- MinHash fingerprinting: 64-hash signatures from leaf-only AST tokens with structural weighting
- LSH index: band-based locality-sensitive hashing for O(1) candidate retrieval
- Parallel scoring: worker pool queries LSH, scores candidates, emits edges
- Unique trigram gate filters trivially short functions
SIMILAR_TOedges with Jaccard similarity and same-file flag in properties
Full-Text Search
- BM25 full-text search via FTS5 with
cbm_camel_splittokenizer (camelCase/snake_case aware) - Incremental FTS5 rebuild on index updates
New Edge Types & Detection
EMITS/LISTENS_ONedges for Socket.IO, EventEmitter, and generic channel patterns- Constant resolution:
const EVENT = "foo"; emit(EVENT)resolves channel names through per-file constant tables IMPORTSedges with relative path resolution for JS/TS (./foo,../bar), Python (.helpers,..utils), RubyDATA_FLOWSedges with argument-to-parameter mapping + field access chains- Cross-service communication discovery + RAM-first incremental indexing
- AST-based route registration replacing prescan infrastructure
- HCL infrastructure binding extraction + prefix-decorator false positive fix
- Generalized route registration + infra binding bridge
Graph Query & Tool Improvements
- 6 previously-ignored params wired up:
min_degree,max_degree,exclude_entry_points,include_connected,aspectsfilter,sincefor detect_changes include_testsparam ontrace_path— mark test files in BFS resultsrisk_labelsontrace_pathfor security-sensitive path tracing--progressCLI flag for real-time indexing feedbackCBM_CACHE_DIRenv var for configurable database directorymoderateindex mode added to tool schema (between full and fast)- Schema properties exposure for
param_names,param_types,decorators include_connectedfix: BFS inbound+outbound run separately (was merging incorrectly)
Quality of Life
- Nested .gitignore support: subdirectory gitignores now respected during indexing — critical for monorepos (#178)
- Skill consolidation: 4 separate skills merged into 1 with progressive disclosure
- Smart update:
skip update when already on latest version - Runtime binary detection in install command (no longer hardcoded)
- Git submodule support in watcher: detect dirty state inside submodules
- Fast→full mode change detection + auto-enable UI for ui-variant binary
- Layout endpoint: O(n*e) edge mapping replaced with binary search
- Layout JSON: handle invalid UTF-8 and NaN in serialization
Platform Fixes
- Windows: Zed/VS Code/KiloCode config paths, PATH delimiter, S_IXUSR check, agent detection using home_dir-relative paths, APPDATA-based userconfig test
- Linux portable: Alpine musl compatibility, security audits added to smoke tests, XDG_CONFIG_HOME in smoke environment
- Cross-platform vector blob assembly: preprocessor conditionals for macOS Mach-O / Linux ELF / Windows COFF
- C++ SEGV fix: NULL deref in LSP type resolver on large header files
Code Quality & Linting
- 337 linter warnings resolved across 16 files (named constants, cognitive complexity extraction)
- Cognitive complexity threshold set to industry default (25), 168 functions split
cbm_write_dbgod-function split (569 → 325 lines)- All NOLINTNEXTLINE suppressions eliminated, iterative AST walkers
- ASan leak fix in semantic corpus token_map
CI/CD & Security
- Decoupled security gate: security-static + CodeQL run independently, don't block test/build/smoke pipeline
- Security audits on ALL binary variants (standard + UI) — previously UI binaries were unaudited
- AV-safe token vocabulary: 11 heuristic-triggering words removed from Nomic embeddings
- CI split into reusable workflow components
- Vendored dependency bumps: SQLite 3.51.3, Mongoose 7.21, mimalloc 3.2.8
- Actions bumped: download-artifact v8.0.1, attest-sbom v2, cosign v4.1.1, msys2 v2.31.0, checkout v6.0.2, cache v5.0.4, upload-artifact v7.0.0, attest-build-provenance v4.1.0, codeql-action v4.35.1
Contributors
- @halindrome — Git submodule dirty state detection, risk_labels on trace_path
- @Koolerx — C# Interface registry fix, base_list handler, FTS5 BM25 search, JS/TS IMPORTS resolution, Channel schema
- @dLo999 — CBM_CACHE_DIR configurable database directory, skip-update-when-latest, nested .gitignore support (#178)
- @Selene29 — Layout binary search optimization, UTF-8/NaN serialization fix
- @slvnlrt — Windows PATH delimiter fix, runtime binary path detection
- @jimpark — Zed and VS Code Windows config path fixes
- @ahundt — Wire up silently-ignored search_graph params
- @maplenk — include_tests param on trace_path, search_graph param wiring
- @gdilla — Skill consolidation, risk_labels + --progress CLI flag
VirusTotal Scan Results
All release artifacts scanned — 0 detections across all engines.
| File | Engines | Detections | Report |
|---|---|---|---|
codebase-memory-mcp-linux-amd64 |
64 | 0 | View |
codebase-memory-mcp-linux-arm64 |
62 | 0 | View |
codebase-memory-mcp-linux-amd64-portable |
64 | 0 | View |
codebase-memory-mcp-linux-arm64-portable |
62 | 0 | View |
codebase-memory-mcp-darwin-arm64 |
63 | 0 | View |
codebase-memory-mcp-darwin-amd64 |
61 | 0 | View |
codebase-memory-mcp-windows-amd64.exe |
71 | 0 | View |
codebase-memory-mcp-ui-linux-amd64 |
64 | 0 | View |
codebase-memory-mcp-ui-linux-arm64 |
61 | 0 | View |
codebase-memory-mcp-ui-linux-amd64-portable |
64 | 0 | View |
codebase-memory-mcp-ui-linux-arm64-portable |
63 | 0 | View |
codebase-memory-mcp-ui-darwin-arm64 |
62 | 0 | View |
codebase-memory-mcp-ui-darwin-amd64 |
61 | 0 | View |
codebase-memory-mcp-ui-windows-amd64.exe |
71 | 0 | View |
install.sh |
62 | 0 | View |
install.ps1 |
62 | 0 | View |
LICENSE |
61 | 0 | View |
v0.5.7
v0.5.7 — Stability, Install & Endurance Testing Overhaul
53 commits, 3 merged PRs, 10 bugs closed. The most significant stability release since the Go→C rewrite.
Database Concurrency Fix (Critical)
- Root cause found: three threads (MCP handler, autoindex, watcher) could corrupt the database —
rename(.db.tmp, .db)over open SQLite connections produced 48K+ garbage rows - Architecture change:
rename()eliminated entirely. Indexing writes directly, reindexing deletes old DB first, incremental upserts unchanged - Pipeline lock serializes concurrent runs; corrupt DB auto-detected and cleaned
Install & Update
install.sh+install.ps1included in every release archive with--skip-configflag (#145)- Kills stale MCP servers, strips macOS quarantine, ad-hoc signs binary
- Refreshes all 10 agent configs on every update
- In-memory zip extraction — no
unzipneeded on Windows - Windows
.exepath handling fixed across install, update, and uninstall
Windows Path Normalization (PR #146)
- Mixed path separators normalized to forward slashes at all entry points
cbm_normalize_path_sep()works on ALL platforms (cross-platform DB files)
Soak Test Suite (New)
- Quick soak (10 min), ASan soak (15 min), weekly endurance (4h) — all per-platform
- RSS tracking, FD drift, query latency, crash recovery (kill -9 + clean restart)
- All soak tiers are release gates — no release ships without passing
Bug Fixes
- #139 Stack overflow in autoindex — 8MB default thread stack (thanks @theron-sapp for the detailed crash report with stack addresses, frequency table, and workaround!)
- #140
index_repositoryfails on Windows — this report by @Flipper1994 triggered the complete concurrency architecture overhaul! - #137
detect_changesfails on paths with spaces (thanks @shekthesnek for the sharp observation that 12 tools worked but 1 didn't!) - #135 macOS Gatekeeper blocks binary (thanks @heraque for the thorough xattr/spctl/codesign analysis!)
- #133
search_coderejects Windows backslash (thanks @ckelly8 for pinpointing the root cause!) - #130 O(N²) import extractors hang on large files (thanks @halindrome for both the issue AND the fix in PR #131!)
- #127 Connection closed constantly — all crash paths fixed (thanks @kingofthebongo2008!)
- #145 Skip agent config in install scripts (thanks @sherif-fanous — implemented same day!)
- Arena buffer overflow, test detection gaps, memory leaks, CodeQL TOCTOU, taskkill self-kill, MSYS2 python3 path translation, vendored tre ssize_t
Testing
- 2586 unit tests (up from 2042), zero skipped, zero memory leaks
- 480+ new tests covering arena, FQN, graph buffer, MCP dispatch, pipeline, store, YAML, watcher
- 15-phase smoke suite on all platforms including Windows
- Soak tests as release gate — endurance verified before every release
Security
- Install scripts in VirusTotal scan alongside binaries (120 min timeout, all files must pass)
system()eliminated from all production code- Vendored dependency integrity checksums enforced
Contributors 🙏
Every bug report and PR made this release better. Thank you:
| Contributor | Contribution |
|---|---|
| @halindrome | O(N²) import fix (PR #131) — merged |
| @jimpark | Windows path normalization (PR #146) — merged |
| @chitralverma | OpenCode config format fix (PR #134) — merged |
| @theron-sapp | Stack overflow crash report (#139) — fixed |
| @Flipper1994 | Windows rename failure (#140) — fixed, triggered concurrency overhaul |
| @shekthesnek | Windows path-with-spaces (#137) — fixed |
| @heraque | macOS quarantine analysis (#135) — fixed |
| @ckelly8 | Windows backslash root cause (#133) — fixed |
| @kingofthebongo2008 | Connection stability (#127) — fixed |
| @sherif-fanous | Skip-config feature request (#145) — implemented |
Security Verification
All release binaries have been independently verified:
VirusTotal — scanned by 70+ antivirus engines:
| Binary | Scan |
|---|---|
| install.sh | View Report |
| install.ps1 | View Report |
| codebase-memory-mcp-windows-amd64.exe | View Report |
| codebase-memory-mcp-ui-windows-amd64.exe | View Report |
| codebase-memory-mcp-ui-linux-arm64 | View Report |
| codebase-memory-mcp-ui-linux-amd64 | View Report |
| codebase-memory-mcp-ui-darwin-arm64 | View Report |
| codebase-memory-mcp-ui-darwin-amd64 | View Report |
| codebase-memory-mcp-linux-arm64 | View Report |
| codebase-memory-mcp-linux-amd64 | View Report |
| codebase-memory-mcp-darwin-arm64 | View Report |
| codebase-memory-mcp-darwin-amd64 | View Report |
| LICENSE | View Report |
| Build Provenance (SLSA) — cryptographic proof each binary was built by GitHub Actions from this repo: |
gh attestation verify <downloaded-file> --repo DeusData/codebase-memory-mcp
Sigstore cosign — keyless signature verification:
cosign verify-blob --bundle <file>.bundle <file>
Native antivirus scans — all binaries passed these scans before this release was created (any detection would have blocked the release):
- Windows: Windows Defender with ML heuristics (the same engine end users run)
- Linux: ClamAV with daily signature updates
- macOS: ClamAV with daily signature updates
SBOM — Software Bill of Materials (sbom.json) lists all vendored dependencies.
See SECURITY.md for full details.
v0.5.6
What's New in v0.5.6
search_code v2 — Graph-Augmented Code Search
The search_code tool has been completely rewritten with a 4-phase pipeline that combines grep speed with knowledge graph intelligence:
- 3 output modes:
compact(default — function names + match lines),full(complete function bodies with highlighted matches),files(file list with match counts) - Graph ranking: results ranked by structural importance (definitions first, popular functions next, tests last)
- Block expansion: grep matches automatically expanded to containing function boundaries — no more fragmented line snippets
path_filter: scope searches to specific directories (e.g.,src/only)contextlines: configurable context around matches in full mode- Directory distribution summary: shows which directories contain matches
Falls back gracefully to raw grep when the project isn't indexed.
Kubernetes & Kustomize Indexing
Full infrastructure-as-code support for Kubernetes manifests:
- Parses Deployments, Services, ConfigMaps, Secrets, Ingress, CronJobs, and 20+ resource types
- Kustomize overlay resolution (base → overlay relationships)
- Resource nodes appear in the knowledge graph with labels, namespaces, and container specs
- New
Resourcenode label in graph schema
User-Defined Extension Mappings
Custom file extension → language mappings via .codebase-memory.json (project-level) or $XDG_CONFIG_HOME/codebase-memory-mcp/config.json (global):
{"extra_extensions": {".blade.php": "php", ".mjs": "javascript"}}Project config takes priority over global config.
Security Fixes
- SQL injection in store search/BFS and argument injection in HTTP server (#124 — @map588)
- Use-after-free in
handle_manage_adrget path (#126 — @halindrome) - Ghost .db file prevention: query handlers now verify project exists before opening SQLite — prevents empty database files from accumulating (#120)
- Binary replacement: new
cbm_replace_binarywith unlink-before-write pattern, handles read-only targets and Windows rename-aside fallback (#114)
Stability & Compatibility Fixes
- MCP stdio buffering: fixed
poll()/getline()FILE* mismatch that causedtools/listto hang on some clients (#99 — @halindrome) - SQLite WAL busy_timeout: set before
journal_mode=WALto preventSQLITE_BUSYon lock contention (#117 — @halindrome) - Import parser O(N²) → O(N): replaced indexed
ts_node_child()loop withTSTreeCursorwalk — fixes quadratic slowdown on files with many imports (#107 — @halindrome) - Session project name mismatch:
detect_sessionnow uses samecbm_project_name_from_path()as pipeline - Windows: UI zip filename fix,
setenv/unsetenvcompat wrappers,USERPROFILEfallback whenHOMEunset - Linux: add
-D_GNU_SOURCEforstrcasestrvisibility (#111 — @trollkotze) - libgit2: fix
-Wmissing-field-initializersbuild error (#91 — @jsyrjala) - Memory leak:
resolve_storeleaked SQLite connection when querying unlinked.dbafterdelete_project
Comprehensive Smoke Tests
Expanded from 4 phases to 7, covering the full binary lifecycle:
- Phase 5: MCP stdio transport — initialize handshake, tools/list, tool call round-trip, Content-Length framing (OpenCode compatibility)
- Phase 6: CLI subcommands — install/uninstall/update
--dry-run, config set/get/reset, simulated binary replacement with read-only edge case - Phase 7: MCP advanced tool calls — search_code v2, get_code_snippet via JSON-RPC
Smoke tests now run in Docker test infrastructure (test-infrastructure/run.sh smoke) and in CI on all 10 platform×variant combinations.
Update Command Improvements
--dry-runflag: shows what would happen without downloading or modifying files--standard/--uiflags: skip interactive variant prompt (CI-friendly)- Restart reminder after successful update
CI & Infrastructure
- Pinned GitHub Actions to commit SHAs (dependabot: VirusTotal 5.0.0, setup-node 6.3.0, cosign-installer, attest-build-provenance)
- Docker test infra:
smokeandsmoke-amd64services for local cross-platform smoke testing - Cleaned up Go-era artifacts, updated THIRD_PARTY.md for pure C project
Contributors
A huge thank you to everyone who contributed to this release:
- @halindrome — Outstanding contributions across the board: K8s/Kustomize indexing, user-defined extension mappings, MCP stdio fix, WAL ordering fix, ghost .db prevention, use-after-free fix, O(N²) import parser fix, and WAL journal mode fix. The backbone of this release.
- @map588 — Critical SQL injection and argument injection security fix
- @trollkotze — Linux build fix for
strcasestrvisibility - @jsyrjala — Build fix for libgit2 field initializers
- @bingh0 — VS Code compatibility fixes (schema validation, install registration, protocol negotiation)
Thank you all for making codebase-memory-mcp better!
Security Verification
All release binaries have been independently verified:
VirusTotal — scanned by 70+ antivirus engines:
| Binary | Scan |
|---|---|
| codebase-memory-mcp-windows-amd64.exe | View Report |
| codebase-memory-mcp-ui-windows-amd64.exe | View Report |
| codebase-memory-mcp-ui-linux-arm64 | View Report |
| codebase-memory-mcp-ui-linux-amd64 | View Report |
| codebase-memory-mcp-ui-darwin-arm64 | View Report |
| codebase-memory-mcp-ui-darwin-amd64 | View Report |
| codebase-memory-mcp-linux-arm64 | View Report |
| codebase-memory-mcp-linux-amd64 | View Report |
| codebase-memory-mcp-darwin-arm64 | View Report |
| codebase-memory-mcp-darwin-amd64 | View Report |
| LICENSE | View Report |
| Build Provenance (SLSA) — cryptographic proof each binary was built by GitHub Actions from this repo: |
gh attestation verify <downloaded-file> --repo DeusData/codebase-memory-mcp
Sigstore cosign — keyless signature verification:
cosign verify-blob --bundle <file>.bundle <file>
Native antivirus scans — all binaries passed these scans before this release was created (any detection would have blocked the release):
- Windows: Windows Defender with ML heuristics (the same engine end users run)
- Linux: ClamAV with daily signature updates
- macOS: ClamAV with daily signature updates
SBOM — Software Bill of Materials (sbom.json) lists all vendored dependencies.
See SECURITY.md for full details.
v0.5.5
Security
- CodeQL SAST — static analysis with build-mode manual (100% source coverage), zero open alerts gate
- Shell injection elimination — replaced
system()calls withcbm_exec_no_shell()(fork+execvp), no tainted data reaches a shell - snprintf overflow fixes — 11 buffer overflow vulnerabilities fixed (clamp offset after each append)
- TOCTOU race fixes — atomic file permissions, open-then-fstat pattern
- 31 security defense tests — shell injection, SQLite authorizer, SQL injection via Cypher, path containment, shell-free exec
- Fuzz testing — random/mutated JSON-RPC + Cypher inputs on every build
- Native antivirus scanning — Windows Defender, ClamAV (Linux + macOS) on every build
- VirusTotal zero-tolerance gate — all release binaries scanned by 70+ engines before publish
- SLSA provenance + Sigstore cosign + SBOM (SPDX 2.3) + SHA-256 checksums
- GitHub Actions pinned to SHA with Dependabot
Antivirus false positive prevention
Added multi-layer AV scanning to the build pipeline to catch and prevent false positives before they reach users. Removed DLL resolve tracking strings that triggered heuristic detection. Every binary in this release has been verified clean by 70+ antivirus engines via VirusTotal. (Fixes #89)
New features
- Content-Length framed transport — OpenCode compatibility
- 10 agent detection — OpenClaw + VS Code support
- Dual MCP config location —
~/.claude/.mcp.json+~/.claude.json
Bug fixes
- Fix Swift call extraction: 0 CALLS edges (#43)
- Fix Laravel route false positives: extension scoping + path filter (PR #65)
- Port FastAPI Depends() edge tracking (PR #66)
- Keep WAL journal mode during bulk write (PR #72)
- Fix VS Code compatibility (PR #79)
- Remove DLL resolve tracking (Windows Defender false positive)
Contributors
Thanks to @halindrome, @bingh0, @mariomeyer, @kingchenc for code contributions, and @Maton-Nenoso for reporting #89 which led to the comprehensive AV scanning infrastructure in this release.
Security Verification
All release binaries have been independently verified:
VirusTotal — scanned by 70+ antivirus engines:
| Binary | Scan |
|---|---|
| codebase-memory-mcp-darwin-amd64 | View Report |
| codebase-memory-mcp-darwin-arm64 | View Report |
| codebase-memory-mcp-linux-amd64 | View Report |
| codebase-memory-mcp-linux-arm64 | View Report |
| codebase-memory-mcp-ui-darwin-amd64 | View Report |
| codebase-memory-mcp-ui-darwin-arm64 | View Report |
| codebase-memory-mcp-ui-linux-amd64 | View Report |
| codebase-memory-mcp-ui-linux-arm64 | View Report |
| codebase-memory-mcp-ui.exe | View Report |
| codebase-memory-mcp-windows-amd64.exe | View Report |
| LICENSE | View Report |
| Build Provenance (SLSA) — cryptographic proof each binary was built by GitHub Actions from this repo: |
gh attestation verify <downloaded-file> --repo DeusData/codebase-memory-mcp
Sigstore cosign — keyless signature verification:
cosign verify-blob --bundle <file>.bundle <file>
Native antivirus scans — all binaries passed these scans before this release was created (any detection would have blocked the release):
- Windows: Windows Defender with ML heuristics (the same engine end users run)
- Linux: ClamAV with daily signature updates
- macOS: ClamAV with daily signature updates
SBOM — Software Bill of Materials (sbom.json) lists all vendored dependencies.
See SECURITY.md for full details.
v0.5.3
Incremental Reindex
Auto-detects previously indexed projects and re-parses only changed files.
- mtime+size classification against stored hashes
- Surgical node deletion (edges cascade), re-parse only deltas
- Instant no-op (<1ms) when nothing changed
- Auto-routes: first run = full RAM pipeline, subsequent = incremental disk
| Scenario | Time |
|---|---|
| Nothing changed | <1ms |
| 1 file modified | ~2ms |
| 1 file added/deleted | ~1ms |
ADR Hints
- index_repository: adr_present + adr_hint when no ADR exists
- get_graph_schema: adr_present + adr_hint per project
- manage_adr GET: creation hint when no ADR
Simplified get_code_snippet
Streamlined to exact QN + suffix matching. Guides users to search_graph when symbol not found.
Upgrading
```bash
codebase-memory-mcp update
```
v0.5.2
Fixes
- Release RAM after indexing: Call `mi_collect(true)` after pipeline completion to return mimalloc pages to the OS. On Linux this immediately reduces RSS; on macOS pages are marked reusable (cosmetically retained until memory pressure).
- Standalone Windows binary: Add `-static` to Windows linker flags. The binary no longer requires `libc++.dll`, `libunwind.dll`, or any MSYS2/CLANG64 runtime DLLs — fully self-contained .exe.
Upgrading
```bash
codebase-memory-mcp update
```