Skip to content

Releases: DeusData/codebase-memory-mcp

v0.8.1

12 Jun 06:36
Immutable release. Only release title and notes can be modified.

Choose a tag to compare

A focused follow-up to v0.8.0.

First-party HTTP server

The graph-UI web server has been reimplemented from scratch as a lean in-house module — a refactor that removes the last third-party server library from the binary. The new transport (src/ui/httpd.c) is purpose-built for what the UI actually needs:

  • Localhost-only by construction — binds 127.0.0.1 exclusively, with platform-correct socket options on every OS.
  • Strict HTTP/1.1 parsing — hard request caps (16 KB head / 1 MB body), strict CRLF handling, raw path matching, and a per-connection receive deadline.
  • Simple by design — one request per connection (Connection: close); no keep-alive state machine, no chunked encoding.

A new 28-test transport and routing suite covers the parsing edge cases, CORS policy, RPC dispatch, and shutdown behavior. All routes, status codes, and the localhost-only CORS policy behave exactly as before.

Also in this release

  • Slimmer grammar set — the nim grammar (by far the largest vendored grammar at 66 MB) was dropped; 158 languages remain supported.
  • Attribution bundle in every archive — release archives now include THIRD_PARTY_NOTICES.md, a consolidated file with the license texts of all vendored components. Homebrew and AUR installs place it alongside the binary.
  • Richer SBOM — per-component license and version metadata for everything compiled into the binary.
  • Test suite: 5,604 tests (+27 vs v0.8.0).

Update

codebase-memory-mcp update

or grab the binaries below — every asset is signed, checksummed, and attested as usual.

Security Verification

All release binaries scanned with 70+ antivirus engines — 0 detections.

Binary SHA-256 VirusTotal
darwin-amd64 6cd48f218bf135a7b831... 0/72 ✅
darwin-arm64 595cedd259200424f3d9... 0/72 ✅
linux-amd64 58a0e1a968dacd324cde... 0/72 ✅
linux-arm64 45d5c33d59d55b998b98... 0/72 ✅
windows-amd64 12375e6a39a31f003d77... 0/72 ✅

v0.8.0

12 Jun 07:49

Choose a tag to compare

📦 Archived release page — binaries for this version are no longer published. Get the latest release.

The headline: three new type-aware LSP engines. Java, Kotlin, and Rust now get full hybrid LSP resolution — type-aware call resolution through class hierarchies, annotation/attribute linkage, signature-aware overload matching, generics handling, and cross-file type inference — joining C/C++, Python, TypeScript/JavaScript, Go, C#, and PHP for 9 languages under hybrid LSP.

Architecture intelligence

  • Leiden community detection: get_architecture now surfaces multi-level Leiden communities (replacing single-level Louvain) — real module boundaries on kernel-sized graphs, computed in seconds instead of minutes.
  • Queryable computation-bottleneck metrics: functions carry complexity, loop-depth, and scan-pattern metrics in the graph — you can now Cypher-query your codebase for its own algorithmic hotspots. (We used exactly this to find and fix several of our own, see below.)
  • Helm support: chart templates, include call linkage, and Chart.yaml dependency edges; HCL block labels folded into node names; C/C++ preprocessor macros are now first-class Macro nodes (2.3M of them in the Linux kernel).

Stability — the biggest hardening push so far

The release fleet now indexes 16 large OSS repos end-to-end — the Linux kernel (4.8M nodes), elasticsearch, kubernetes, the TypeScript compiler, bitcoin, zig, symfony, and more — with database-level validity gates. That hunt, plus outstanding community reports, closed a long list of crashes:

  • C++/templated-code crash family eliminated: uninitialized template-argument arrays, out-of-bounds parameter lookups, unbounded resolve recursion, expression-evaluation blowups — reported across #424, #427/#428/#432, #312, #322/#323, #336, #344, #355, #360, #385.
  • Allocator unification (#424): the whole binary — vendored tree-sitter, SQLite, libgit2 — now runs under one allocator, ending a class of cross-allocator heap corruption on complex C++.
  • Startup crash-loop fixed (#235, #439): a stack-buffer overflow in the project-list error path could SIGABRT the server for all projects; one corrupt cache DB no longer takes down every session.
  • Database integrity, end to end: oversized index keys now spill to overflow pages correctly, all properties JSON is escaped and validated at the producer, the url_path index always matches its generated column, and WAL files are checkpointed on close/startup (#387). Every database in the 16-repo fleet passes PRAGMA integrity_check clean.
  • Server lifecycle: stdio servers exit with their parent via a death watchdog (#406), MCP ping is supported (#354), JSON-RPC string ids are preserved (#253), log level is configurable via CBM_LOG_LEVEL (#413).
  • Windows: wide-char API for non-ASCII paths (#386), cmd.exe-compatible git history pass (#324), PATHEXT probing (#221), packaging-walk hang fixed — the full test suite is green on Windows.
  • Plus: phantom gRPC routes (#294), moderate-mode silently dropping subtrees (#411) with excluded directories now reported in indexing results, validator-safe project names (#349), UI flag diagnostics (#350).

Performance at kernel scale

  • Memory rework: streaming SQLite dump writer, string interning, allocator page reclaim, and extraction back-pressure — the full Linux kernel indexes in ~4 minutes on a laptop, within a bounded memory budget; cgroup-aware CPU/memory detection and a CBM_WORKERS override (#364/#365) make container behavior predictable.
  • Quadratic hot paths eliminated, several found by pointing the new bottleneck metrics at our own codebase: C/C++ cross-LSP overload resolution (#410), registry construction and lookup paths in the PHP/Python/Java/Kotlin resolvers, and architecture-boundary computation. Large PHP and Java monorepos see order-of-magnitude faster full indexes (symfony 35x, elasticsearch 9x vs. this cycle's early builds).
  • Graph quality improved throughout these fixes — resolution coverage and edge fidelity are better in every supported language.

Query layer

  • Cypher: EXISTS {} predicates in WHERE, COUNT(DISTINCT) (#239), label alternation (n:A|B) (#242), label tests in WHERE (#241), WITH DISTINCT (#238), string/scalar/entity-introspection functions, multi-argument scalars — and unsupported syntax now fails loudly instead of returning silent empties (#373).
  • search_code: invalid regex errors instead of empty results (#283), & in paths, timing in responses, wildcard-free file_pattern as substring (#200), .mjs/.cjs/.mts/.cts indexed (#197), Blade templates mapped (#258), R box::use()/library() imports (#218/#219).

A much bigger safety net

  • New per-grammar regression suite across all 159 grammars plus probe suites for extraction, edge creation, and LSP resolution — the suite now runs 5,500+ tests under a strict no-skip policy (skips are hard failures).
  • Crash reproductions execute as subprocess-isolated tests with exit-signal assertions; ASan/LSan run across the suite; the macOS Intel build leg is now blocking so darwin-amd64 always ships.

Distribution & integrations

  • Published to the official MCP Registry and the Glama directory; portable static Linux binary on all install/update paths.
  • install --plan machine-readable receipts before any config mutation (#388), Cursor IDE detection (#222), fish-native PATH handling (#319), $CLAUDE_CONFIG_DIR respected (#321), SessionStart hooks for Codex, Gemini CLI, and Antigravity (#330), ADR storage unified between MCP tools and the UI (#256).

🙏 Code contributors

This release contains code from 16 community contributors — both directly merged PRs and PRs we distilled onto main (distilled PRs are credited to their authors; thank you for the patience with that workflow):

  • @mattall — defensive type validation (PR #427) and out-of-line C++ method attribution (PR #428), plus the precise follow-up diagnosis in #432
  • @jjserenity — Windows wide-char path support (#386) and WAL checkpointing (#387), with HC Cheng co-authoring
  • @cmeerw — MCP ping support (#354), CLOCK_MONOTONIC timing (#342)
  • @romanornr — pending-template-call guard (#322) and expression-evaluation cap (#323)
  • @yangsec888 (Sam Li) — cgroup-aware CPU/memory detection + CBM_WORKERS override (PR #365)
  • @nvt-pankajsharma — parent-death watchdog for the stdio server (PR #407)
  • @santanusinha — environment-driven log level (PR #414)
  • @casualjim — TS-LSP tuple-return arena fix (PR #374) and arena NULL guard
  • @Smiie-2 — C++ template default-argument segfault fix (PR #360)
  • @jjoos — growable node stacks in channel extraction (PR #339)
  • @halindrome (Shane McCarron) — $CLAUDE_CONFIG_DIR support (PR #321)
  • @hippus (Alexey Z) — Windows cmd.exe-compatible git history (PR #325)
  • @code-by-mahereddy — security & correctness fixes from a codebase audit (PR #352)
  • @cyprien-bussy — workspace import resolution + SvelteKit route extraction (PR #369)
  • @utilia-ai-wox — workspace/pkgmap route fixes

🙏 Issue reporters

Precise, reproducible reports drove this release's stability work: @ZapAndersson (#424), @mattall (#427/#428/#432), @Charles5277 (#439), @awconstable (#235), @mattepiu (#410), @nvt-pankajsharma (#406), @GonEbal (#308/#408), @Patch76 (#411), @santanusinha (#413), @romanornr (#322/#323), @cmeerw (#312/#342/#354), @jjoos (#305/#340), @Codyzzz-zach (#344), @lilisir0722-crypto (#355), @EnziinSystem (#385), @Al2Klimov (#336), @hippus (#324), @yangsec888 (#364), @mydreamdoctor (#253), @zer09 (#373), @sponger94 (#294), @marcusyoung (#218/#219), @maksodf (#237-#242), @ehendrix23 (#252), @gaia (#258), @speeed76 (#200), @digijoebz (#213), @nishitpatel92 (#220), @Icaruk (#221), @xberry1231 (#222), @memniko (#227), @mandricmihai (#228), @tomerikjansen-crypto (#197), @cryptomaltese (#283), @SavageMessiah (#319), @caioribeiroclw-pixel (#388), @dergachoff (#330), @bschrib (#256), @Nick-CHI (#350), @edecarvalhoreis (#349), @Azami1990 (#367), @Carnival-z (#382).

If we missed anyone, tell us and we'll fix the notes — every report and PR moves this project forward.

Full changelog: v0.7.0...v0.8.0 — 206 commits

Security Verification

All release binaries scanned with 70+ antivirus engines — 0 detections.

Binary SHA-256 VirusTotal
darwin-amd64 d3e46236a32a9c9f7315... 0/72 ✅
darwin-arm64 cf28ac4e0d2cca9a5a07... 0/72 ✅
linux-amd64 ac4cae50ecad302a510f... 0/72 ✅
linux-arm64 abde85eea0c0a540afd0... 0/72 ✅
windows-amd64 d100a3ea2fd9fc05616a... 0/72 ✅

v0.7.0

12 Jun 07:49

Choose a tag to compare

📦 Archived release page — binaries for this version are no longer published. Get the latest release.

v0.7.0 — Hybrid LSP across six languages, community contributions, full-platform validation

🧠 The headline: hybrid LSP

This release is the one where the call graph stops being a guess.

For every supported language we now ship a hybrid Light-Semantic-Pass LSP — a per-file, type-aware resolver that runs inside extraction. It tracks scopes, infers expression types, follows imports, walks inheritance, and rewrites the resolved callee on every CALLS edge. The plain tree-sitter pass gives you the structural graph; the LSP pass gives you the right callee. Six languages now benefit — four of them gained the LSP for the first time in this release.

New LSPs (introduced this cycle)

Language What the LSP does Headline
Python scope binding, expression typing, method dispatch, super(), decorators, multi-inheritance, TypedDict, match/case narrowing, walrus, comprehension element typing, generics & cross-file resolution 100% on the bench; 95% on real-world instance-attribute resolution
PHP Light Semantic Pass, generic templates, narrowing, self/static/$this, trait substitution, @phpstan-type aliases, Closure::bind, variance, 278 unit tests parity-grade across Laravel/Symfony/Doctrine/Guzzle/PSR idioms
TypeScript / JavaScript / JSX / TSX full TS semantic surface — unions, intersections, function types, JSX element resolution, dts mode, callback-param contextual typing the full TS dialect family, one resolver
C# / .NET class/struct/record/interface/enum, primary constructors, indexers, accessors, LINQ chains, generics, switch narrowing C# 12 modern features land on day one

Sharpened LSPs (existed at v0.6.1; substantially upgraded here)

Language What was sharpened
C / C++ / CUDA Tier 2 pre-built per-language cross-LSP registry; template return-type substitution; namespace + class body walkers made O(n)
Go Tier 2 + Tier 3 metadata-driven cross-LSP (skips the AST re-walk entirely); per-file walkers O(n); pooled thread-local parser

The Tier 1/2/3 architecture (this is the part that makes it tractable)

The LSP work is genuinely expensive — naively, every resolve worker would re-build a typing context per file. Instead it runs in three tiers:

  • Tier 1 — per-file LSP inside cbm_extract_file: scopes, expression typing, local resolution.
  • Tier 2 — pre-built per-language cross-LSP registry: a shared read-only base built once from every project def, then chained from per-file overlays via a fallback pointer. Resolve workers share it; no per-file re-walks of the entire registry. (Now wired for all six languages.)
  • Tier 3 — metadata-driven cross-LSP (Go): consumes the lsp_unresolved metadata the per-file pass emits and skips the cross-file AST re-walk entirely.

Made O(n) on wide files

Every per-file LSP walker — process_node, ast_sweep_shapes, apply_jsdoc_signatures, infer_implicit_returns, *_resolve_calls_in_node, plus the seven extract-side DFS walkers — used the for (i=0; i<count; i++) ts_node_child(node, i) idiom. That's O(i) per call in tree-sitter, so O(n²) over a wide root. Fixtures like TypeScript's reallyLargeFile.ts (583k lines, almost all comments) made this catastrophic.

A new cbm_lsp_collect_children O(n) cursor helper + a shared ts_nstack_push_children for the extract DFS now apply across all six LSPs and the extractors. Numbers from the dry-run validation:

Repo Files LSP overrides Before → After
microsoft/TypeScript 40,689 7,432 5,100s → 50s (full); advanced 100s
dotnet/roslyn 17,916 91,089 crash → 46s
kubernetes (Go) 20,650 80,818 — → 51s
WordPress (PHP) 3,622 7,303 — → 7s
postgres (C) 4,967 44 — → 8s
torvalds/linux 88,539 4,052 — → 188s (full) / 207s (with LSP cross)

lsp_overrides is "calls whose callee the LSP re-attributed to a more precise target than the textual resolver inferred." For richly-typed languages (C#, Go, TS, PHP) that's tens of thousands of corrections.

Two real bugs fixed along the way

  • process_node indexed the NULL-terminated param_types[] by the call's arg count; a call with more args than the function declares params read past the terminator → deref of a garbage CBMType* → crash. Now bounded by a param_count scan.
  • return_type_of built tuple return types via cbm_type_tuple(NULL, …). Multi-return signatures now thread ctx->arena through.

Always on

CBM_MODE_ADVANCED is removed. The LSP resolution runs in every index mode (full, moderate, fast) — at this point the walkers are O(n), the quality (~4% of calls re-attributed on TS, ~14% on Go) is worth the latency in every mode. "mode":"advanced" requests fall back to full (identical behaviour to the old advanced mode, so no breaking change). As a side benefit, the 104 test_incremental.c failures that were silently caused by incremental running with LSP gated off are now gone.


🌲 Pine Script & new platforms

🔗 Graph & extraction

  • USAGE edges for decorator applications — Python/TS frameworks light up properly — by Matthew Prock (@map588) (#208).
  • INHERITS edges fully emitted for Java extends + implements by Loay Chlih (@loaychlih) (#279).
  • Temporal properties on FILE_CHANGES_WITH edges and File nodes by Adam Schulte (#257).
  • C# 12 primary-constructor + field/property extraction, typed-stub factory constructors, and satellite-galaxy / cross-galaxy UI by @sponger94.
  • Build-tool path-alias resolution for cross-file imports and mode-skipped file preservation during incremental indexing by Peter Cox (#243, #251).
  • ES imports from embedded <script> blocks by James (@jmcmacnz) (#224).
  • Growable arena-allocated traversal stacks (no more fixed-size truncation) by Ahmed Mohammed (#217).

⚡ Performance (beyond LSP)

  • search_graph regex cache + LIKE pre-filter + cheap count and a two-step FTS5 sub-query to kill multi-minute search_graph query= latency, by Austen Constable (@awconstable) (#300 and follow-ups).
  • Verstable v2.2.1 hash table (by Jackson Allan) vendored as the new CBMHashTable engine — hot-path resolve speedup; 14 production callers (graph_buffer, registry, pipeline passes, watcher, semantic, …) get the new impl transparently.
  • Extract O(n²) eliminated across seven whole-tree DFS walkers (calls, channels, env-accesses, imports, type-assigns, variables) — the same ts_node_child(node, i) idiom that bit the LSP. microsoft/TypeScript full-mode: 5,097s → 50s (102×).
  • Tree-sitter source-buffer padding in every parse pass — eliminates a latent benign over-read at parse boundaries.

🛡️ Security

HackerOne researcher submissions landed by their original authors:

  • search_code multi-word regex patch by Jan Deelstra (#304).
  • sqlite_writer index-cell page guard by Jos Joosten (#303).
  • GitHub Actions shell injection in _build.yml fixed by Dustin Obrecht (@dLo999) (#249).

Plus:

  • search_graph default limit capped at 200 (was 500k — DoS hardening) by @amitmynd (#231).
  • Cypher / store buffer-overflow, OOM, and NULL-stmt crash hardening + thread-safety race elimination in log mutex, watcher, and indexer by Matthew Prock (@map588) (#206, #207).
  • ws bumped to 8.21.0 (GHSA-58qx-3vcg-4xpx / CVE-2026-45736).

💾 Reliability & store

  • CBM_SQLITE_MMAP_SIZE env-controlled mmap + PASSIVE checkpoint to prevent file-shrink under concurrent readers by @edwardmhughes (#315 + follow-up).
  • cbm_store_get_architecture wired into the MCP handler by Oliver Evans (@OliverEvans96) (#281).
  • safe_free / safe_str_free / safe_buf_free / safe_grow memory helpers, with rollout to heavy sites, by Matthew Prock (@map588).
  • trace_path qualified-name fallback and list_projects no longer hides tmp--prefixed projects by Justin Wiley (NVIDIA).
  • .m extension content-based disambiguation by @KuaaMU (#306).
  • AUR package docs by Chris Werner Rau (@cwrau) (#278).

✅ CI & build

The dry-run pipeline is fully green on every platform: lint (cppcheck + clang-format-20), security-static, CodeQL gate, test, build, smoke, quick soak — Windows + macOS arm64/intel + linux amd64/arm64.

🔍 Disclosures (release-readiness transparency)

Some changes in the CI-greening sweep were judgment calls, not pure fixes. Flagging them so reviewers can second-guess:

  1. LSP benchmarks (pylsp_bench_resolution_ratio, cslsp_bench_resolution_ratio) — gated behind CBM_SKIP_PERF for the dry run; in the release CI (skip_perf=false) they run with ASan-aware time budgets (×10 under sanitizers) and free the result before asserting so any future budget miss doesn't leak. The call-resolution-ratio quality is also asserted by the non-bench test_py_lsp and test_cs_lsp suites that always run in CI.
  2. Security audit allow-list extensions (each reviewed; documented inline):
    • URL allow-list: https://www.sqlite.org/c3ref/c_checkpoint_full.html — a doc reference i...
Read more

v0.6.1

12 Jun 07:49

Choose a tag to compare

📦 Archived release page — binaries for this version are no longer published. Get the latest release.

v0.6.1 — 89 New Languages, Cross-Repo Intelligence, Team-Shared Graph Artifact, npm+PyPI Distribution

50+ commits since v0.6.0. Adds 89 tree-sitter grammars (66 → 155 languages), introduces cross-repo intelligence with CROSS_* edges, ships team-shared graph artifacts (.codebase-memory/graph.db.zst), introduces full distribution wrappers (npm/PyPI/Homebrew/Scoop/Winget/Chocolatey/AUR/Go) with npm + PyPI now auto-publishing as part of the release pipeline, and rolls out comprehensive installer security hardening.

Languages & Parsing — 66 → 155

  • 89 new tree-sitter grammars vendored, with vocabulary-cleaned tokenization and grammar security audit script
  • Lang spec coverage filled in for 114 languages with proper node types — Go (func_literal), JS (do_statement, fixed stale case_clause), C#/Python imports, shared arrays
  • 77 new extension mapping tests covering the new languages
  • C#, Rust, Scala grammars updated to latest upstream
  • lang_specs refactor: designated initializers + factory pointer

Cross-Repo Intelligence

  • CROSS_ edge types* for cross-repo dependencies and architectural relationships
  • gRPC / GraphQL / tRPC service detection with protobuf Route extraction
  • gRPC stub detection in call resolution + chained call extraction
  • Multi-galaxy UI layout + cross-repo architecture summary view

Team-Shared Graph Artifact

  • .codebase-memory/graph.db.zst — zstd-compressed knowledge graph that can be committed to the repo. Teammates bootstrap from the artifact instead of running a full reindex from scratch.
  • Vendored zstd 1.5.7 (amalgamated, ~52K LOC) for 8–13:1 compression
  • Two-tier export: zstd -9 + index stripping + VACUUM INTO for explicit indexes (best ratio); zstd -3 for watcher/incremental auto-updates (low-latency)
  • Import path: decompress → integrity check → auto-recreate indexes
  • Auto-bootstrap in index_repository: when no local DB exists but the artifact is present, import first then run incremental indexing
  • Auto-creates .gitattributes with merge=ours to prevent merge conflicts on the binary artifact

Imports & Channels

  • Generic package/module resolution for IMPORTS edges across 10 languages (resolves bare specifiers like @myorg/pkg, github.com/foo/bar, use my_crate::foo via manifest scanning: package.json, go.mod, Cargo.toml, pyproject.toml, composer.json, pubspec.yaml, pom.xml, build.gradle, mix.exs, *.gemspec)
  • Channel detection expanded from JS/TS to 8 languages

Distribution

Now installable directly from public package registries:

npm install -g codebase-memory-mcp     # npm
pip install codebase-memory-mcp         # PyPI
go install github.com/DeusData/codebase-memory-mcp/pkg/go@latest   # Go
  • npm + PyPI auto-publish integrated into the release pipeline (publish-registries job after verify, then atomic publish-final un-drafts the GitHub release only after both registries succeed — no half-shipped state)
  • npm package uses --provenance (GitHub OIDC build attestations visible on npmjs.com)
  • Full distribution wrappers in pkg/ for: npm, PyPI, Homebrew, Scoop, Winget, Chocolatey, AUR, Go

Security Hardening

  • PyPI installer: hardened against tar-slip and scheme-confusion attacks (PR #248 by @dLo999, closes #246)
  • npm installer: checksum verification, HTTPS-only redirects, no shell injection
  • Cross-installer hardening: removed Unblock-File, added HTTPS-only URL validation
  • vite bumped to 6.4.2 — fixes CVE GHSA-4w7w-66w2-5vf9 and GHSA-p9ff-h696-f583
  • Grammar security audit added to vendor pipeline
  • README: VirusTotal scan links (binary hashes), SLSA badge, Security & Trust section, transparency disclaimer, responsible-disclosure invitation
  • arXiv paper badge + citation

Stability & Quality

  • get_graph_schema now exposes property definitions per node label
  • sqlite_writer overflow pages — fixes SIGBUS on large records (#139)
  • RSS reclamation after delete_project: explicit mem_collect + immediate purge
  • MCP tools / CLI: improved error handling, diagnostics, and cancellation
  • Cherry-picked extraction & Cypher improvements from PR #162

Editor / Agent Integration

  • Kiro CLI support (#96)

Platform Fixes

  • Windows: pass_pkgmap now uses cbm_strndup (mingw clang lacks POSIX strndup)
  • test_watcher: uses GIT_AUTHOR_* / GIT_COMMITTER_* env vars instead of mutating global git config

CI / Smoke

  • Smoke test JSON parsing fixed — CLI default mode unwraps the MCP envelope; smoke now parses the inner JSON directly
  • Binary string audit allowlisttelnet URI scheme from the rst grammar is documented as a known-benign match

Contributors

Thanks to everyone who contributed to this release:

Full changelog: v0.6.0...v0.6.1

Security Verification

All release binaries scanned with 70+ antivirus engines — 0 detections.

Binary SHA-256 VirusTotal
darwin-amd64 7836878876c8956f6413... 0/72 ✅
darwin-arm64 3e72c8cb364c431d99f1... 0/72 ✅
linux-amd64 7e6624b345f994afb901... 0/72 ✅
linux-arm64 ac2498c45235c1bf37f8... 0/72 ✅
windows-amd64 d773be23ed0823d58677... 0/72 ✅

v0.6.0

12 Jun 07:49

Choose a tag to compare

📦 Archived release page — binaries for this version are no longer published. Get the latest release.

v0.6.0 — Semantic Search, SIMILAR_TO Edges & Cross-Language Intelligence

85+ commits since v0.5.7. Major release adding vector-based semantic search, structural near-clone detection, cross-language import resolution, and significant quality-of-life improvements across all platforms.

Semantic Search & Vector Embeddings

  • semantic_query tool: keyword-based vector search across the entire codebase graph via cbm_cosine_i8 SQL function
  • Nomic nomic-embed-code embeddings: 40K pretrained token vectors (768d int8), distilled from nomic-ai/nomic-embed-code with simulated attention
  • 11-signal combined scoring: TF-IDF, Reflective Random Indexing, API/Type/Decorator signatures, AST structural profiles, approximate data flow, Halstead-lite metrics, MinHash, module proximity, graph diffusion
  • SEMANTICALLY_RELATED edges: connect functions with vocabulary mismatch but similar purpose (score >= 0.80, max 10 per node, same-language only)
  • Per-keyword min-cosine scoring replaces merged vector averaging for better precision
  • Score clamping to [0,1] — proximity multiplier no longer pushes scores above 1.0
  • Clone deduplication: SIMILAR_TO pairs with Jaccard >= threshold skip SEMANTICALLY_RELATED

SIMILAR_TO Edges (Near-Clone Detection)

  • MinHash fingerprinting: 64-hash signatures from leaf-only AST tokens with structural weighting
  • LSH index: band-based locality-sensitive hashing for O(1) candidate retrieval
  • Parallel scoring: worker pool queries LSH, scores candidates, emits edges
  • Unique trigram gate filters trivially short functions
  • SIMILAR_TO edges with Jaccard similarity and same-file flag in properties

Full-Text Search

  • BM25 full-text search via FTS5 with cbm_camel_split tokenizer (camelCase/snake_case aware)
  • Incremental FTS5 rebuild on index updates

New Edge Types & Detection

  • EMITS / LISTENS_ON edges for Socket.IO, EventEmitter, and generic channel patterns
  • Constant resolution: const EVENT = "foo"; emit(EVENT) resolves channel names through per-file constant tables
  • IMPORTS edges with relative path resolution for JS/TS (./foo, ../bar), Python (.helpers, ..utils), Ruby
  • DATA_FLOWS edges with argument-to-parameter mapping + field access chains
  • Cross-service communication discovery + RAM-first incremental indexing
  • AST-based route registration replacing prescan infrastructure
  • HCL infrastructure binding extraction + prefix-decorator false positive fix
  • Generalized route registration + infra binding bridge

Graph Query & Tool Improvements

  • 6 previously-ignored params wired up: min_degree, max_degree, exclude_entry_points, include_connected, aspects filter, since for detect_changes
  • include_tests param on trace_path — mark test files in BFS results
  • risk_labels on trace_path for security-sensitive path tracing
  • --progress CLI flag for real-time indexing feedback
  • CBM_CACHE_DIR env var for configurable database directory
  • moderate index mode added to tool schema (between full and fast)
  • Schema properties exposure for param_names, param_types, decorators
  • include_connected fix: BFS inbound+outbound run separately (was merging incorrectly)

Quality of Life

  • Nested .gitignore support: subdirectory gitignores now respected during indexing — critical for monorepos (#178)
  • Skill consolidation: 4 separate skills merged into 1 with progressive disclosure
  • Smart update: skip update when already on latest version
  • Runtime binary detection in install command (no longer hardcoded)
  • Git submodule support in watcher: detect dirty state inside submodules
  • Fast→full mode change detection + auto-enable UI for ui-variant binary
  • Layout endpoint: O(n*e) edge mapping replaced with binary search
  • Layout JSON: handle invalid UTF-8 and NaN in serialization

Platform Fixes

  • Windows: Zed/VS Code/KiloCode config paths, PATH delimiter, S_IXUSR check, agent detection using home_dir-relative paths, APPDATA-based userconfig test
  • Linux portable: Alpine musl compatibility, security audits added to smoke tests, XDG_CONFIG_HOME in smoke environment
  • Cross-platform vector blob assembly: preprocessor conditionals for macOS Mach-O / Linux ELF / Windows COFF
  • C++ SEGV fix: NULL deref in LSP type resolver on large header files

Code Quality & Linting

  • 337 linter warnings resolved across 16 files (named constants, cognitive complexity extraction)
  • Cognitive complexity threshold set to industry default (25), 168 functions split
  • cbm_write_db god-function split (569 → 325 lines)
  • All NOLINTNEXTLINE suppressions eliminated, iterative AST walkers
  • ASan leak fix in semantic corpus token_map

CI/CD & Security

  • Decoupled security gate: security-static + CodeQL run independently, don't block test/build/smoke pipeline
  • Security audits on ALL binary variants (standard + UI) — previously UI binaries were unaudited
  • AV-safe token vocabulary: 11 heuristic-triggering words removed from Nomic embeddings
  • CI split into reusable workflow components
  • Vendored dependency bumps: SQLite 3.51.3, Mongoose 7.21, mimalloc 3.2.8
  • Actions bumped: download-artifact v8.0.1, attest-sbom v2, cosign v4.1.1, msys2 v2.31.0, checkout v6.0.2, cache v5.0.4, upload-artifact v7.0.0, attest-build-provenance v4.1.0, codeql-action v4.35.1

Contributors

  • @halindrome — Git submodule dirty state detection, risk_labels on trace_path
  • @Koolerx — C# Interface registry fix, base_list handler, FTS5 BM25 search, JS/TS IMPORTS resolution, Channel schema
  • @dLo999 — CBM_CACHE_DIR configurable database directory, skip-update-when-latest, nested .gitignore support (#178)
  • @Selene29 — Layout binary search optimization, UTF-8/NaN serialization fix
  • @slvnlrt — Windows PATH delimiter fix, runtime binary path detection
  • @jimpark — Zed and VS Code Windows config path fixes
  • @ahundt — Wire up silently-ignored search_graph params
  • @maplenk — include_tests param on trace_path, search_graph param wiring
  • @gdilla — Skill consolidation, risk_labels + --progress CLI flag

VirusTotal Scan Results

All release artifacts scanned — 0 detections across all engines.

File Engines Detections Report
codebase-memory-mcp-linux-amd64 64 0 View
codebase-memory-mcp-linux-arm64 62 0 View
codebase-memory-mcp-linux-amd64-portable 64 0 View
codebase-memory-mcp-linux-arm64-portable 62 0 View
codebase-memory-mcp-darwin-arm64 63 0 View
codebase-memory-mcp-darwin-amd64 61 0 View
codebase-memory-mcp-windows-amd64.exe 71 0 View
codebase-memory-mcp-ui-linux-amd64 64 0 View
codebase-memory-mcp-ui-linux-arm64 61 0 View
codebase-memory-mcp-ui-linux-amd64-portable 64 0 View
codebase-memory-mcp-ui-linux-arm64-portable 63 0 View
codebase-memory-mcp-ui-darwin-arm64 62 0 View
codebase-memory-mcp-ui-darwin-amd64 61 0 View
codebase-memory-mcp-ui-windows-amd64.exe 71 0 View
install.sh 62 0 View
install.ps1 62 0 View
LICENSE 61 0 View

v0.5.7

26 Mar 13:22

Choose a tag to compare

v0.5.7 — Stability, Install & Endurance Testing Overhaul

53 commits, 3 merged PRs, 10 bugs closed. The most significant stability release since the Go→C rewrite.

Database Concurrency Fix (Critical)

  • Root cause found: three threads (MCP handler, autoindex, watcher) could corrupt the database — rename(.db.tmp, .db) over open SQLite connections produced 48K+ garbage rows
  • Architecture change: rename() eliminated entirely. Indexing writes directly, reindexing deletes old DB first, incremental upserts unchanged
  • Pipeline lock serializes concurrent runs; corrupt DB auto-detected and cleaned

Install & Update

  • install.sh + install.ps1 included in every release archive with --skip-config flag (#145)
  • Kills stale MCP servers, strips macOS quarantine, ad-hoc signs binary
  • Refreshes all 10 agent configs on every update
  • In-memory zip extraction — no unzip needed on Windows
  • Windows .exe path handling fixed across install, update, and uninstall

Windows Path Normalization (PR #146)

  • Mixed path separators normalized to forward slashes at all entry points
  • cbm_normalize_path_sep() works on ALL platforms (cross-platform DB files)

Soak Test Suite (New)

  • Quick soak (10 min), ASan soak (15 min), weekly endurance (4h) — all per-platform
  • RSS tracking, FD drift, query latency, crash recovery (kill -9 + clean restart)
  • All soak tiers are release gates — no release ships without passing

Bug Fixes

  • #139 Stack overflow in autoindex — 8MB default thread stack (thanks @theron-sapp for the detailed crash report with stack addresses, frequency table, and workaround!)
  • #140 index_repository fails on Windows — this report by @Flipper1994 triggered the complete concurrency architecture overhaul!
  • #137 detect_changes fails on paths with spaces (thanks @shekthesnek for the sharp observation that 12 tools worked but 1 didn't!)
  • #135 macOS Gatekeeper blocks binary (thanks @heraque for the thorough xattr/spctl/codesign analysis!)
  • #133 search_code rejects Windows backslash (thanks @ckelly8 for pinpointing the root cause!)
  • #130 O(N²) import extractors hang on large files (thanks @halindrome for both the issue AND the fix in PR #131!)
  • #127 Connection closed constantly — all crash paths fixed (thanks @kingofthebongo2008!)
  • #145 Skip agent config in install scripts (thanks @sherif-fanous — implemented same day!)
  • Arena buffer overflow, test detection gaps, memory leaks, CodeQL TOCTOU, taskkill self-kill, MSYS2 python3 path translation, vendored tre ssize_t

Testing

  • 2586 unit tests (up from 2042), zero skipped, zero memory leaks
  • 480+ new tests covering arena, FQN, graph buffer, MCP dispatch, pipeline, store, YAML, watcher
  • 15-phase smoke suite on all platforms including Windows
  • Soak tests as release gate — endurance verified before every release

Security

  • Install scripts in VirusTotal scan alongside binaries (120 min timeout, all files must pass)
  • system() eliminated from all production code
  • Vendored dependency integrity checksums enforced

Contributors 🙏

Every bug report and PR made this release better. Thank you:

Contributor Contribution
@halindrome O(N²) import fix (PR #131) — merged
@jimpark Windows path normalization (PR #146) — merged
@chitralverma OpenCode config format fix (PR #134) — merged
@theron-sapp Stack overflow crash report (#139) — fixed
@Flipper1994 Windows rename failure (#140) — fixed, triggered concurrency overhaul
@shekthesnek Windows path-with-spaces (#137) — fixed
@heraque macOS quarantine analysis (#135) — fixed
@ckelly8 Windows backslash root cause (#133) — fixed
@kingofthebongo2008 Connection stability (#127) — fixed
@sherif-fanous Skip-config feature request (#145) — implemented

Security Verification

All release binaries have been independently verified:

VirusTotal — scanned by 70+ antivirus engines:

Binary Scan
install.sh View Report
install.ps1 View Report
codebase-memory-mcp-windows-amd64.exe View Report
codebase-memory-mcp-ui-windows-amd64.exe View Report
codebase-memory-mcp-ui-linux-arm64 View Report
codebase-memory-mcp-ui-linux-amd64 View Report
codebase-memory-mcp-ui-darwin-arm64 View Report
codebase-memory-mcp-ui-darwin-amd64 View Report
codebase-memory-mcp-linux-arm64 View Report
codebase-memory-mcp-linux-amd64 View Report
codebase-memory-mcp-darwin-arm64 View Report
codebase-memory-mcp-darwin-amd64 View Report
LICENSE View Report
Build Provenance (SLSA) — cryptographic proof each binary was built by GitHub Actions from this repo:
gh attestation verify <downloaded-file> --repo DeusData/codebase-memory-mcp

Sigstore cosign — keyless signature verification:

cosign verify-blob --bundle <file>.bundle <file>

Native antivirus scans — all binaries passed these scans before this release was created (any detection would have blocked the release):

  • Windows: Windows Defender with ML heuristics (the same engine end users run)
  • Linux: ClamAV with daily signature updates
  • macOS: ClamAV with daily signature updates

SBOM — Software Bill of Materials (sbom.json) lists all vendored dependencies.

See SECURITY.md for full details.

v0.5.6

23 Mar 22:04

Choose a tag to compare

What's New in v0.5.6

search_code v2 — Graph-Augmented Code Search

The search_code tool has been completely rewritten with a 4-phase pipeline that combines grep speed with knowledge graph intelligence:

  • 3 output modes: compact (default — function names + match lines), full (complete function bodies with highlighted matches), files (file list with match counts)
  • Graph ranking: results ranked by structural importance (definitions first, popular functions next, tests last)
  • Block expansion: grep matches automatically expanded to containing function boundaries — no more fragmented line snippets
  • path_filter: scope searches to specific directories (e.g., src/ only)
  • context lines: configurable context around matches in full mode
  • Directory distribution summary: shows which directories contain matches

Falls back gracefully to raw grep when the project isn't indexed.

Kubernetes & Kustomize Indexing

Full infrastructure-as-code support for Kubernetes manifests:

  • Parses Deployments, Services, ConfigMaps, Secrets, Ingress, CronJobs, and 20+ resource types
  • Kustomize overlay resolution (base → overlay relationships)
  • Resource nodes appear in the knowledge graph with labels, namespaces, and container specs
  • New Resource node label in graph schema

User-Defined Extension Mappings

Custom file extension → language mappings via .codebase-memory.json (project-level) or $XDG_CONFIG_HOME/codebase-memory-mcp/config.json (global):

{"extra_extensions": {".blade.php": "php", ".mjs": "javascript"}}

Project config takes priority over global config.

Security Fixes

  • SQL injection in store search/BFS and argument injection in HTTP server (#124@map588)
  • Use-after-free in handle_manage_adr get path (#126@halindrome)
  • Ghost .db file prevention: query handlers now verify project exists before opening SQLite — prevents empty database files from accumulating (#120)
  • Binary replacement: new cbm_replace_binary with unlink-before-write pattern, handles read-only targets and Windows rename-aside fallback (#114)

Stability & Compatibility Fixes

  • MCP stdio buffering: fixed poll()/getline() FILE* mismatch that caused tools/list to hang on some clients (#99@halindrome)
  • SQLite WAL busy_timeout: set before journal_mode=WAL to prevent SQLITE_BUSY on lock contention (#117@halindrome)
  • Import parser O(N²) → O(N): replaced indexed ts_node_child() loop with TSTreeCursor walk — fixes quadratic slowdown on files with many imports (#107@halindrome)
  • Session project name mismatch: detect_session now uses same cbm_project_name_from_path() as pipeline
  • Windows: UI zip filename fix, setenv/unsetenv compat wrappers, USERPROFILE fallback when HOME unset
  • Linux: add -D_GNU_SOURCE for strcasestr visibility (#111@trollkotze)
  • libgit2: fix -Wmissing-field-initializers build error (#91@jsyrjala)
  • Memory leak: resolve_store leaked SQLite connection when querying unlinked .db after delete_project

Comprehensive Smoke Tests

Expanded from 4 phases to 7, covering the full binary lifecycle:

  • Phase 5: MCP stdio transport — initialize handshake, tools/list, tool call round-trip, Content-Length framing (OpenCode compatibility)
  • Phase 6: CLI subcommands — install/uninstall/update --dry-run, config set/get/reset, simulated binary replacement with read-only edge case
  • Phase 7: MCP advanced tool calls — search_code v2, get_code_snippet via JSON-RPC

Smoke tests now run in Docker test infrastructure (test-infrastructure/run.sh smoke) and in CI on all 10 platform×variant combinations.

Update Command Improvements

  • --dry-run flag: shows what would happen without downloading or modifying files
  • --standard / --ui flags: skip interactive variant prompt (CI-friendly)
  • Restart reminder after successful update

CI & Infrastructure

  • Pinned GitHub Actions to commit SHAs (dependabot: VirusTotal 5.0.0, setup-node 6.3.0, cosign-installer, attest-build-provenance)
  • Docker test infra: smoke and smoke-amd64 services for local cross-platform smoke testing
  • Cleaned up Go-era artifacts, updated THIRD_PARTY.md for pure C project

Contributors

A huge thank you to everyone who contributed to this release:

  • @halindrome — Outstanding contributions across the board: K8s/Kustomize indexing, user-defined extension mappings, MCP stdio fix, WAL ordering fix, ghost .db prevention, use-after-free fix, O(N²) import parser fix, and WAL journal mode fix. The backbone of this release.
  • @map588 — Critical SQL injection and argument injection security fix
  • @trollkotze — Linux build fix for strcasestr visibility
  • @jsyrjala — Build fix for libgit2 field initializers
  • @bingh0 — VS Code compatibility fixes (schema validation, install registration, protocol negotiation)

Thank you all for making codebase-memory-mcp better!


Security Verification

All release binaries have been independently verified:

VirusTotal — scanned by 70+ antivirus engines:

Binary Scan
codebase-memory-mcp-windows-amd64.exe View Report
codebase-memory-mcp-ui-windows-amd64.exe View Report
codebase-memory-mcp-ui-linux-arm64 View Report
codebase-memory-mcp-ui-linux-amd64 View Report
codebase-memory-mcp-ui-darwin-arm64 View Report
codebase-memory-mcp-ui-darwin-amd64 View Report
codebase-memory-mcp-linux-arm64 View Report
codebase-memory-mcp-linux-amd64 View Report
codebase-memory-mcp-darwin-arm64 View Report
codebase-memory-mcp-darwin-amd64 View Report
LICENSE View Report
Build Provenance (SLSA) — cryptographic proof each binary was built by GitHub Actions from this repo:
gh attestation verify <downloaded-file> --repo DeusData/codebase-memory-mcp

Sigstore cosign — keyless signature verification:

cosign verify-blob --bundle <file>.bundle <file>

Native antivirus scans — all binaries passed these scans before this release was created (any detection would have blocked the release):

  • Windows: Windows Defender with ML heuristics (the same engine end users run)
  • Linux: ClamAV with daily signature updates
  • macOS: ClamAV with daily signature updates

SBOM — Software Bill of Materials (sbom.json) lists all vendored dependencies.

See SECURITY.md for full details.

v0.5.5

21 Mar 23:07

Choose a tag to compare

Security

  • CodeQL SAST — static analysis with build-mode manual (100% source coverage), zero open alerts gate
  • Shell injection elimination — replaced system() calls with cbm_exec_no_shell() (fork+execvp), no tainted data reaches a shell
  • snprintf overflow fixes — 11 buffer overflow vulnerabilities fixed (clamp offset after each append)
  • TOCTOU race fixes — atomic file permissions, open-then-fstat pattern
  • 31 security defense tests — shell injection, SQLite authorizer, SQL injection via Cypher, path containment, shell-free exec
  • Fuzz testing — random/mutated JSON-RPC + Cypher inputs on every build
  • Native antivirus scanning — Windows Defender, ClamAV (Linux + macOS) on every build
  • VirusTotal zero-tolerance gate — all release binaries scanned by 70+ engines before publish
  • SLSA provenance + Sigstore cosign + SBOM (SPDX 2.3) + SHA-256 checksums
  • GitHub Actions pinned to SHA with Dependabot

Antivirus false positive prevention

Added multi-layer AV scanning to the build pipeline to catch and prevent false positives before they reach users. Removed DLL resolve tracking strings that triggered heuristic detection. Every binary in this release has been verified clean by 70+ antivirus engines via VirusTotal. (Fixes #89)

New features

  • Content-Length framed transport — OpenCode compatibility
  • 10 agent detection — OpenClaw + VS Code support
  • Dual MCP config location~/.claude/.mcp.json + ~/.claude.json

Bug fixes

  • Fix Swift call extraction: 0 CALLS edges (#43)
  • Fix Laravel route false positives: extension scoping + path filter (PR #65)
  • Port FastAPI Depends() edge tracking (PR #66)
  • Keep WAL journal mode during bulk write (PR #72)
  • Fix VS Code compatibility (PR #79)
  • Remove DLL resolve tracking (Windows Defender false positive)

Contributors

Thanks to @halindrome, @bingh0, @mariomeyer, @kingchenc for code contributions, and @Maton-Nenoso for reporting #89 which led to the comprehensive AV scanning infrastructure in this release.


Security Verification

All release binaries have been independently verified:

VirusTotal — scanned by 70+ antivirus engines:

Binary Scan
codebase-memory-mcp-darwin-amd64 View Report
codebase-memory-mcp-darwin-arm64 View Report
codebase-memory-mcp-linux-amd64 View Report
codebase-memory-mcp-linux-arm64 View Report
codebase-memory-mcp-ui-darwin-amd64 View Report
codebase-memory-mcp-ui-darwin-arm64 View Report
codebase-memory-mcp-ui-linux-amd64 View Report
codebase-memory-mcp-ui-linux-arm64 View Report
codebase-memory-mcp-ui.exe View Report
codebase-memory-mcp-windows-amd64.exe View Report
LICENSE View Report
Build Provenance (SLSA) — cryptographic proof each binary was built by GitHub Actions from this repo:
gh attestation verify <downloaded-file> --repo DeusData/codebase-memory-mcp

Sigstore cosign — keyless signature verification:

cosign verify-blob --bundle <file>.bundle <file>

Native antivirus scans — all binaries passed these scans before this release was created (any detection would have blocked the release):

  • Windows: Windows Defender with ML heuristics (the same engine end users run)
  • Linux: ClamAV with daily signature updates
  • macOS: ClamAV with daily signature updates

SBOM — Software Bill of Materials (sbom.json) lists all vendored dependencies.

See SECURITY.md for full details.

v0.5.3

20 Mar 13:09

Choose a tag to compare

Incremental Reindex

Auto-detects previously indexed projects and re-parses only changed files.

  • mtime+size classification against stored hashes
  • Surgical node deletion (edges cascade), re-parse only deltas
  • Instant no-op (<1ms) when nothing changed
  • Auto-routes: first run = full RAM pipeline, subsequent = incremental disk
Scenario Time
Nothing changed <1ms
1 file modified ~2ms
1 file added/deleted ~1ms

ADR Hints

  • index_repository: adr_present + adr_hint when no ADR exists
  • get_graph_schema: adr_present + adr_hint per project
  • manage_adr GET: creation hint when no ADR

Simplified get_code_snippet

Streamlined to exact QN + suffix matching. Guides users to search_graph when symbol not found.

Upgrading

```bash
codebase-memory-mcp update
```

v0.5.2

19 Mar 21:40

Choose a tag to compare

Fixes

  • Release RAM after indexing: Call `mi_collect(true)` after pipeline completion to return mimalloc pages to the OS. On Linux this immediately reduces RSS; on macOS pages are marked reusable (cosmetically retained until memory pressure).
  • Standalone Windows binary: Add `-static` to Windows linker flags. The binary no longer requires `libc++.dll`, `libunwind.dll`, or any MSYS2/CLANG64 runtime DLLs — fully self-contained .exe.

Upgrading

```bash
codebase-memory-mcp update
```