Skip to content

feat: git-based cache fingerprinting for instant cache lookups#42

Merged
jonathanpopham merged 2 commits intosupermodeltools:mainfrom
jonathanpopham:feat/cache-fingerprint
Apr 3, 2026
Merged

feat: git-based cache fingerprinting for instant cache lookups#42
jonathanpopham merged 2 commits intosupermodeltools:mainfrom
jonathanpopham:feat/cache-fingerprint

Conversation

@jonathanpopham
Copy link
Copy Markdown
Contributor

@jonathanpopham jonathanpopham commented Apr 3, 2026

Partial #33.

Problem

Every command created a zip and SHA256'd it before checking the cache. On supermodel-public-api, this took 1-10 seconds — even when the result was already cached.

Fix

Use git rev-parse HEAD (~1ms) to check the cache before creating a zip. Only create and upload the zip on a cache miss.

Performance

Before (cache hit): 1-10 seconds (create zip, hash, find cached result)
After  (cache hit): 35 milliseconds (git fingerprint, find cached result)

1800x faster cache hits on supermodel-public-api.

How it works

  1. cache.RepoFingerprint(dir) — returns commit SHA for clean repos (~1ms), commitSHA:diffHash for dirty repos (~100ms)
  2. cache.AnalysisKey(fingerprint, "graph") — hashes fingerprint + analysis type into a cache key
  3. analyze.GetGraph() checks fingerprint cache first, falls back to zip-hash cache, only uploads on full miss
  4. On cache miss, stores result under both fingerprint key and zip hash key

Changes

  • internal/cache/fingerprint.go — new RepoFingerprint(), AnalysisKey()
  • internal/cache/fingerprint_test.go — 7 tests (clean, dirty, stable, commit changes, not-git, key types)
  • internal/analyze/handler.goGetGraph() uses fingerprint fast path

Test plan

  • go test ./... — all 14 packages pass
  • Live: supermodel analyze --force (63s) then supermodel analyze (0.035s)

Summary by CodeRabbit

  • New Features

    • Faster caching: analysis now first tries a repository-state fingerprint as the primary cache key (falls back to the archive-hash when needed). Successful analyses are written to cache under both the fingerprint-based key and the archive-hash key.
  • Tests

    • Added tests validating repository fingerprint behavior, determinism, dirty/clean distinctions, and analysis-key generation for different analysis types.

@coderabbitai
Copy link
Copy Markdown

coderabbitai bot commented Apr 3, 2026

Walkthrough

GetGraph now attempts a git-based repository fingerprint lookup (when force is false) before creating a zip. On fingerprint cache miss it falls back to the zip-hash flow; after analysis the result is cached under both the fingerprint-derived key (if available) and the zip-hash key.

Changes

Cohort / File(s) Summary
Fingerprinting Infrastructure
internal/cache/fingerprint.go, internal/cache/fingerprint_test.go
Added RepoFingerprint(dir string) (string, error) (uses git rev-parse, git status, git diff) and AnalysisKey(fingerprint, analysisType string) to produce stable cache keys. Tests validate clean/dirty/non-git behaviors and key determinism.
Handler Cache Integration
internal/analyze/handler.go
GetGraph now does an early fingerprint-based cache lookup when force is false; on miss it proceeds with zip creation and zip-hash lookup. Successful analysis writes cache entries for both fingerprint and zip-hash keys. No exported signatures changed.

Sequence Diagram

sequenceDiagram
    participant Handler as GetGraph Handler
    participant GitFP as Repo Fingerprinter
    participant Git as Git Repo
    participant Cache as Cache Layer
    participant Analyzer as Analysis Engine

    Handler->>GitFP: RepoFingerprint(dir)
    GitFP->>Git: git rev-parse HEAD
    Git-->>GitFP: commit SHA
    GitFP->>Git: git status --porcelain --untracked-files=no
    Git-->>GitFP: status output
    alt clean
        GitFP-->>Handler: fingerprint (commitSHA)
    else dirty
        GitFP->>Git: git diff HEAD
        Git-->>GitFP: diff output
        GitFP-->>Handler: fingerprint (commitSHA:sha8(diff))
    end

    Handler->>Cache: AnalysisKey(fingerprint, "graph") → lookup
    Cache-->>Handler: hit/miss

    alt fingerprint cache hit
        Cache-->>Handler: cached graph
    else fingerprint miss
        Handler->>Analyzer: create archive & compute zip-hash
        Analyzer->>Cache: lookup(zip-hash)
        alt zip-hash hit
            Cache-->>Handler: cached graph
        else zip-hash miss
            Analyzer->>Analyzer: perform analysis
            Analyzer-->>Handler: new graph
            Handler->>Cache: write(fingerprint_key, graph) (if fingerprint available)
            Handler->>Cache: write(zip_hash_key, graph)
        end
    end
Loading

Estimated code review effort

🎯 3 (Moderate) | ⏱️ ~22 minutes

Suggested reviewers

  • greynewell

Poem

🔎 A repo's state, a tiny sign,
Commits and diffs condense to line,
We check the key, then zip if not,
Cache both ways so work isn't lost—
Fast hits, calm builds, devs can dine. 🍽️

🚥 Pre-merge checks | ✅ 2 | ❌ 1

❌ Failed checks (1 warning)

Check name Status Explanation Resolution
Docstring Coverage ⚠️ Warning Docstring coverage is 35.71% which is insufficient. The required threshold is 80.00%. Write docstrings for the functions missing them to satisfy the coverage threshold.
✅ Passed checks (2 passed)
Check name Status Explanation
Title check ✅ Passed The title accurately summarizes the main change: introducing git-based cache fingerprinting to enable fast cache lookups without creating a zip file first.
Description check ✅ Passed The PR description is comprehensive and well-structured. It clearly explains the problem, solution, performance impact, implementation details, files changed, and includes a completed test plan with checkmarks.

✏️ Tip: You can configure your own custom pre-merge checks in the settings.

✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests

Comment @coderabbitai help to get the list of available commands and usage tips.

Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Nitpick comments (2)
internal/cache/fingerprint_test.go (1)

60-63: Optional: assert the actual dirty format, not just length.

Since the comment says “colon separator,” assert strings.Contains(fp, ":") directly to lock the contract.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cache/fingerprint_test.go` around lines 60 - 63, The test currently
only checks fp length; also assert the expected colon separator by adding an
assertion like strings.Contains(fp, ":") in the test that defines fp
(fingerprint_test.go, inside the same test function where fp is produced);
import the strings package if missing and use t.Errorf or t.Fatalf to fail the
test with a clear message such as "expected dirty fingerprint to contain ':',
got %q", fp so the contract is enforced.
internal/analyze/handler.go (1)

31-36: Consider centralizing cache flow to avoid handler drift.

internal/analyze/handler.go now has fingerprint fast path, but internal/focus/handler.go:340-375 still does zip-hash only. A shared helper would keep behavior/perf consistent across commands.

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/analyze/handler.go` around lines 31 - 36, The cache lookup/creation
logic is duplicated: GetGraph in internal/analyze/handler.go implements a
git-fingerprint fast path but internal/focus/handler.go still uses the older
zip-hash-only flow; extract the cache flow into a shared helper (e.g., a
function in an analyze/internal/cache package or exported in internal/analyze)
that accepts (ctx context.Context, cfg *config.Config, dir string, force bool)
and returns (*api.Graph, string, error), implementing the fingerprint-first
check and zip-on-miss behavior; then replace the zip-hash-only logic in
internal/focus/handler.go with a call to this new helper so both handlers share
identical cache/fingerprint behavior and signatures (update callers to use the
helper instead of their inline logic such as the current zip/hash code paths).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/analyze/handler.go`:
- Around line 37-47: The code currently computes a repo fingerprint via
cache.RepoFingerprint(dir) and later recomputes it after zipping/uploading which
creates a TOCTOU risk; instead capture and reuse the initial fingerprint and the
derived cache.AnalysisKey (the 'fingerprint' and 'key' variables computed in the
fast-path block) for any subsequent cache operations and when storing the
resulting graph so you never call cache.RepoFingerprint(dir) a second time;
update the code paths that call cache.AnalysisKey or cache.Set/Get after
upload/analyze to reference the originally computed key and fingerprint (and
ensure any logging uses g.RepoID() alongside that same key) so the stored graph
and cache lookup use the identical identifier.

In `@internal/cache/fingerprint_test.go`:
- Around line 68-69: The tests call RepoFingerprint twice (and at other spots)
while discarding the returned error, which can hide failures; update each call
to RepoFingerprint in fingerprint_test.go (including the occurrences currently
at the locations referenced) to assert that err == nil immediately after the
call (e.g., using the test framework's require/Assert/if err != nil t.Fatalf),
so failures fail fast and do not allow empty/invalid fingerprints to be compared
as equal.

In `@internal/cache/fingerprint.go`:
- Around line 23-25: The current code returns commitSHA when running "git
status" fails, causing false clean fingerprints; instead propagate the error so
callers can fallback to zip-hash. In the function in
internal/cache/fingerprint.go where you check err after running git status (the
branch that currently does "if err != nil { return commitSHA, nil }"), replace
that return with returning an empty/zero fingerprint and the wrapped error
(e.g., return "", fmt.Errorf("git status failed: %w", err)) so callers receive
the failure; update any callers of this function to handle the returned error
path accordingly.
- Around line 22-37: The current fingerprint logic calls gitOutput(dir,
"status", "--porcelain", "--untracked-files=no") and therefore ignores untracked
files; change the git status call to use "--untracked-files=all" (i.e.
gitOutput(dir, "status", "--porcelain", "--untracked-files=all")), then inspect
the status output for lines beginning with "??" (untracked files) and, if any
are present, return a non-cacheable result (e.g. return an error to skip caching
or return commitSHA + ":dirty" per project convention) instead of treating the
repo as clean; keep the existing diff hashing path (gitOutput(dir, "diff",
"HEAD") and sha256 logic) for truly modified files.

---

Nitpick comments:
In `@internal/analyze/handler.go`:
- Around line 31-36: The cache lookup/creation logic is duplicated: GetGraph in
internal/analyze/handler.go implements a git-fingerprint fast path but
internal/focus/handler.go still uses the older zip-hash-only flow; extract the
cache flow into a shared helper (e.g., a function in an analyze/internal/cache
package or exported in internal/analyze) that accepts (ctx context.Context, cfg
*config.Config, dir string, force bool) and returns (*api.Graph, string, error),
implementing the fingerprint-first check and zip-on-miss behavior; then replace
the zip-hash-only logic in internal/focus/handler.go with a call to this new
helper so both handlers share identical cache/fingerprint behavior and
signatures (update callers to use the helper instead of their inline logic such
as the current zip/hash code paths).

In `@internal/cache/fingerprint_test.go`:
- Around line 60-63: The test currently only checks fp length; also assert the
expected colon separator by adding an assertion like strings.Contains(fp, ":")
in the test that defines fp (fingerprint_test.go, inside the same test function
where fp is produced); import the strings package if missing and use t.Errorf or
t.Fatalf to fail the test with a clear message such as "expected dirty
fingerprint to contain ':', got %q", fp so the contract is enforced.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: 8931ccb8-7700-4745-a0ec-58d60703a64c

📥 Commits

Reviewing files that changed from the base of the PR and between a653527 and 25c1c6e.

📒 Files selected for processing (3)
  • internal/analyze/handler.go
  • internal/cache/fingerprint.go
  • internal/cache/fingerprint_test.go

Comment on lines +68 to +69
fp1, _ := RepoFingerprint(dir)
fp2, _ := RepoFingerprint(dir)
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟡 Minor

These tests should fail fast on fingerprint errors.

At Line 68, Line 69, Line 77, and Line 85, ignored errors can mask regressions (e.g., empty fingerprints compared as equal). Please assert err == nil in each call.

🔧 Suggested test hardening
-	fp1, _ := RepoFingerprint(dir)
-	fp2, _ := RepoFingerprint(dir)
+	fp1, err := RepoFingerprint(dir)
+	if err != nil {
+		t.Fatal(err)
+	}
+	fp2, err := RepoFingerprint(dir)
+	if err != nil {
+		t.Fatal(err)
+	}
 	if fp1 != fp2 {
 		t.Errorf("fingerprint should be stable: %q != %q", fp1, fp2)
 	}

Also applies to: 77-77, 85-85

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cache/fingerprint_test.go` around lines 68 - 69, The tests call
RepoFingerprint twice (and at other spots) while discarding the returned error,
which can hide failures; update each call to RepoFingerprint in
fingerprint_test.go (including the occurrences currently at the locations
referenced) to assert that err == nil immediately after the call (e.g., using
the test framework's require/Assert/if err != nil t.Fatalf), so failures fail
fast and do not allow empty/invalid fingerprints to be compared as equal.

Comment on lines +22 to +37
dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=no")
if err != nil {
return commitSHA, nil
}

if dirty == "" {
return commitSHA, nil
}

// Dirty: hash the diff to capture uncommitted changes.
diff, err := gitOutput(dir, "diff", "HEAD")
if err != nil {
return commitSHA + ":dirty", nil
}
h := sha256.Sum256([]byte(diff))
return commitSHA + ":" + hex.EncodeToString(h[:8]), nil
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🔴 Critical

🧩 Analysis chain

🏁 Script executed:

#!/bin/bash
set -euo pipefail

tmp="$(mktemp -d)"
cd "$tmp"

git init -q
git config user.email "test@test.com"
git config user.name "test"

echo 'package main' > main.go
git add main.go
git commit -q -m "init"

echo '--- baseline ---'
git status --porcelain --untracked-files=no | sed -n '1,5p'
git diff HEAD | wc -c

echo 'package main // new file' > new.go

echo '--- after adding untracked file ---'
echo '[status with --untracked-files=no]'
git status --porcelain --untracked-files=no | sed -n '1,5p'
echo '[status with --untracked-files=all]'
git status --porcelain --untracked-files=all | sed -n '1,5p'
echo '[git diff HEAD bytes]'
git diff HEAD | wc -c

Repository: supermodeltools/cli

Length of output: 301


Untracked files bypass cache invalidation and cause stale results to be returned.

At lines 22 and 31–37, the code uses --untracked-files=no with git status and git diff HEAD to detect dirty repos. The test confirms this misses untracked files entirely: when you add a new untracked file like new.go, git status --porcelain --untracked-files=no returns empty—the file is invisible. Since git diff HEAD also ignores untracked files, the fingerprint stays as the plain commit SHA instead of switching to the :dirty variant. This means the cache will keep serving old analysis results even though the codebase has actually changed.

Concrete scenario: you commit code, run analysis (cached as commit SHA), then add a new untracked file, run analysis again—and get the stale cached result instead of re-analyzing.

The fix is straightforward: switch to --untracked-files=all and check for the ?? prefix (which marks untracked files), then return an error to skip caching.

Suggested fix
-	dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=no")
+	dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=all")
 	if err != nil {
 		return commitSHA, nil
 	}

 	if dirty == "" {
 		return commitSHA, nil
 	}
+
+	// Conservative: if untracked files exist, skip fingerprint caching to avoid stale hits.
+	if strings.Contains(dirty, "?? ") {
+		return "", fmt.Errorf("untracked files present")
+	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=no")
if err != nil {
return commitSHA, nil
}
if dirty == "" {
return commitSHA, nil
}
// Dirty: hash the diff to capture uncommitted changes.
diff, err := gitOutput(dir, "diff", "HEAD")
if err != nil {
return commitSHA + ":dirty", nil
}
h := sha256.Sum256([]byte(diff))
return commitSHA + ":" + hex.EncodeToString(h[:8]), nil
dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=all")
if err != nil {
return commitSHA, nil
}
if dirty == "" {
return commitSHA, nil
}
// Conservative: if untracked files exist, skip fingerprint caching to avoid stale hits.
if strings.Contains(dirty, "?? ") {
return "", fmt.Errorf("untracked files present")
}
// Dirty: hash the diff to capture uncommitted changes.
diff, err := gitOutput(dir, "diff", "HEAD")
if err != nil {
return commitSHA + ":dirty", nil
}
h := sha256.Sum256([]byte(diff))
return commitSHA + ":" + hex.EncodeToString(h[:8]), nil
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cache/fingerprint.go` around lines 22 - 37, The current fingerprint
logic calls gitOutput(dir, "status", "--porcelain", "--untracked-files=no") and
therefore ignores untracked files; change the git status call to use
"--untracked-files=all" (i.e. gitOutput(dir, "status", "--porcelain",
"--untracked-files=all")), then inspect the status output for lines beginning
with "??" (untracked files) and, if any are present, return a non-cacheable
result (e.g. return an error to skip caching or return commitSHA + ":dirty" per
project convention) instead of treating the repo as clean; keep the existing
diff hashing path (gitOutput(dir, "diff", "HEAD") and sha256 logic) for truly
modified files.

Comment on lines +23 to +25
if err != nil {
return commitSHA, nil
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Don’t treat git status failure as a clean fingerprint.

At Line 24, returning commitSHA on status failure can create false cache hits for unknown repo state. Prefer returning an error (or at least a dirty sentinel) so caller falls back to zip-hash safely.

✅ Minimal correctness fix
 	dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=all")
 	if err != nil {
-		return commitSHA, nil
+		return "", fmt.Errorf("git status failed: %w", err)
 	}
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
if err != nil {
return commitSHA, nil
}
if err != nil {
return "", fmt.Errorf("git status failed: %w", err)
}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/cache/fingerprint.go` around lines 23 - 25, The current code returns
commitSHA when running "git status" fails, causing false clean fingerprints;
instead propagate the error so callers can fallback to zip-hash. In the function
in internal/cache/fingerprint.go where you check err after running git status
(the branch that currently does "if err != nil { return commitSHA, nil }"),
replace that return with returning an empty/zero fingerprint and the wrapped
error (e.g., return "", fmt.Errorf("git status failed: %w", err)) so callers
receive the failure; update any callers of this function to handle the returned
error path accordingly.

Add RepoFingerprint() that uses git rev-parse HEAD (~1ms) for clean
repos and git diff hash (~100ms) for dirty repos. GetGraph() now
checks the cache before creating a zip — cache hits skip zip creation
entirely.

Before: every command created a zip and hashed it (1-10s).
After: cache hit in 35ms (1800x faster on supermodel-public-api).

New functions:
- cache.RepoFingerprint(dir) — fast repo state key
- cache.AnalysisKey(fingerprint, type) — typed cache key

7 new tests for fingerprint (clean, dirty, stable, changes-after-commit,
not-git, analysis key differentiation, stability).

Partial supermodeltools#33.
@jonathanpopham jonathanpopham force-pushed the feat/cache-fingerprint branch from 08e026d to b80e3c1 Compare April 3, 2026 21:36
Copy link
Copy Markdown

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 1

♻️ Duplicate comments (1)
internal/analyze/handler.go (1)

79-86: ⚠️ Potential issue | 🟠 Major

Don't recompute the fingerprint after the archive is frozen.

Line 80 calls RepoFingerprint again after createZip and Analyze. Example: archive state A, someone edits the repo during upload, Line 80 fingerprints state B, and now graph A is cached under key B. The next fingerprint hit can serve the wrong graph. Reuse the original fingerprint key here instead of recomputing it.

🛠️ Suggested fix
-	// Cache under both keys: fingerprint (fast lookup) and zip hash (fallback).
-	fingerprint, fpErr := cache.RepoFingerprint(dir)
-	if fpErr == nil {
-		fpKey := cache.AnalysisKey(fingerprint, "graph")
-		if err := cache.Put(fpKey, g); err != nil {
-			ui.Warn("could not write cache: %v", err)
-		}
-	}
+	// Cache under both keys: fingerprint (fast lookup) and zip hash (fallback).
+	if fpKey != "" {
+		if err := cache.Put(fpKey, g); err != nil {
+			ui.Warn("could not write cache: %v", err)
+		}
+	}
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/analyze/handler.go` around lines 79 - 86, The code recomputes the
repo fingerprint after creating the archive and analyzing it, which can mismatch
the archived state; instead, reuse the original fingerprint computed before
createZip/Analyze (or compute it once before freezing) when building fpKey with
cache.AnalysisKey and calling cache.Put, i.e., ensure the fpKey uses the
pre-freeze fingerprint variable (the same value used to identify the archive)
rather than calling RepoFingerprint again.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.

Inline comments:
In `@internal/analyze/handler.go`:
- Around line 37-47: The code returns different cache keys for the same repo
state because the legacy zip-hash path and the fingerprint fast-path produce
different keys; update GetGraph so it always returns one canonical key based on
RepoFingerprint when available: when RepoFingerprint(dir) succeeds compute fpKey
:= cache.AnalysisKey(fingerprint, "graph") and use/return that key; if you hit a
graph under the legacy zip/hash key, backfill the fingerprint entry by storing
the same graph under fpKey (e.g. cache.Set(fpKey, graph) or the equivalent cache
API) before returning so subsequent calls return the canonical fpKey. Ensure
references: RepoFingerprint, AnalysisKey, GetGraph, and the cache get/set calls
are updated accordingly.

---

Duplicate comments:
In `@internal/analyze/handler.go`:
- Around line 79-86: The code recomputes the repo fingerprint after creating the
archive and analyzing it, which can mismatch the archived state; instead, reuse
the original fingerprint computed before createZip/Analyze (or compute it once
before freezing) when building fpKey with cache.AnalysisKey and calling
cache.Put, i.e., ensure the fpKey uses the pre-freeze fingerprint variable (the
same value used to identify the archive) rather than calling RepoFingerprint
again.
🪄 Autofix (Beta)

Fix all unresolved CodeRabbit comments on this PR:

  • Push a commit to this branch (recommended)
  • Create a new PR with the fixes

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: e63804fb-b00f-4357-910a-0d70db8d0ccd

📥 Commits

Reviewing files that changed from the base of the PR and between 08e026d and b80e3c1.

📒 Files selected for processing (3)
  • internal/analyze/handler.go
  • internal/cache/fingerprint.go
  • internal/cache/fingerprint_test.go
✅ Files skipped from review due to trivial changes (1)
  • internal/cache/fingerprint_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
  • internal/cache/fingerprint.go

Comment on lines +37 to +47
// Fast path: check cache using git fingerprint before creating zip.
if !force {
fingerprint, err := cache.RepoFingerprint(dir)
if err == nil {
key := cache.AnalysisKey(fingerprint, "graph")
if g, _ := cache.Get(key); g != nil {
ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
return g, key, nil
}
}
}
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue | 🟠 Major

Return one canonical key for the same repo state.

Simple repro: on a cold cache, the first call returns hash at Line 92; the second call on the unchanged repo returns fpKey from Lines 41-44. That breaks the current GetGraph contract in internal/analyze/integration_test.go:64-65, which expects both calls to yield the same key. It also means a legacy zip-hash hit at Lines 65-67 never gets upgraded to the fast path because the fingerprint entry is never backfilled.

💡 Suggested shape
 func GetGraph(ctx context.Context, cfg *config.Config, dir string, force bool) (*api.Graph, string, error) {
-	// Fast path: check cache using git fingerprint before creating zip.
-	if !force {
-		fingerprint, err := cache.RepoFingerprint(dir)
-		if err == nil {
-			key := cache.AnalysisKey(fingerprint, "graph")
-			if g, _ := cache.Get(key); g != nil {
-				ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
-				return g, key, nil
-			}
-		}
-	}
+	var fpKey string
+	if fingerprint, err := cache.RepoFingerprint(dir); err == nil {
+		fpKey = cache.AnalysisKey(fingerprint, "graph")
+		if !force {
+			if g, _ := cache.Get(fpKey); g != nil {
+				ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
+				return g, fpKey, nil
+			}
+		}
+	}
@@
 	if !force {
 		if g, _ := cache.Get(hash); g != nil {
+			if fpKey != "" {
+				if err := cache.Put(fpKey, g); err != nil {
+					ui.Warn("could not write cache: %v", err)
+				}
+				ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
+				return g, fpKey, nil
+			}
 			ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
 			return g, hash, nil
 		}
 	}
@@
-	return g, hash, nil
+	if fpKey != "" {
+		return g, fpKey, nil
+	}
+	return g, hash, nil
 }

Also applies to: 63-68, 91-92

🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.

In `@internal/analyze/handler.go` around lines 37 - 47, The code returns different
cache keys for the same repo state because the legacy zip-hash path and the
fingerprint fast-path produce different keys; update GetGraph so it always
returns one canonical key based on RepoFingerprint when available: when
RepoFingerprint(dir) succeeds compute fpKey := cache.AnalysisKey(fingerprint,
"graph") and use/return that key; if you hit a graph under the legacy zip/hash
key, backfill the fingerprint entry by storing the same graph under fpKey (e.g.
cache.Set(fpKey, graph) or the equivalent cache API) before returning so
subsequent calls return the canonical fpKey. Ensure references: RepoFingerprint,
AnalysisKey, GetGraph, and the cache get/set calls are updated accordingly.

@jonathanpopham jonathanpopham marked this pull request as ready for review April 3, 2026 22:24
@jonathanpopham jonathanpopham merged commit 2a498c0 into supermodeltools:main Apr 3, 2026
7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant