feat: git-based cache fingerprinting for instant cache lookups#42
Conversation
WalkthroughGetGraph now attempts a git-based repository fingerprint lookup (when Changes
Sequence DiagramsequenceDiagram
participant Handler as GetGraph Handler
participant GitFP as Repo Fingerprinter
participant Git as Git Repo
participant Cache as Cache Layer
participant Analyzer as Analysis Engine
Handler->>GitFP: RepoFingerprint(dir)
GitFP->>Git: git rev-parse HEAD
Git-->>GitFP: commit SHA
GitFP->>Git: git status --porcelain --untracked-files=no
Git-->>GitFP: status output
alt clean
GitFP-->>Handler: fingerprint (commitSHA)
else dirty
GitFP->>Git: git diff HEAD
Git-->>GitFP: diff output
GitFP-->>Handler: fingerprint (commitSHA:sha8(diff))
end
Handler->>Cache: AnalysisKey(fingerprint, "graph") → lookup
Cache-->>Handler: hit/miss
alt fingerprint cache hit
Cache-->>Handler: cached graph
else fingerprint miss
Handler->>Analyzer: create archive & compute zip-hash
Analyzer->>Cache: lookup(zip-hash)
alt zip-hash hit
Cache-->>Handler: cached graph
else zip-hash miss
Analyzer->>Analyzer: perform analysis
Analyzer-->>Handler: new graph
Handler->>Cache: write(fingerprint_key, graph) (if fingerprint available)
Handler->>Cache: write(zip_hash_key, graph)
end
end
Estimated code review effort🎯 3 (Moderate) | ⏱️ ~22 minutes Suggested reviewers
Poem
🚥 Pre-merge checks | ✅ 2 | ❌ 1❌ Failed checks (1 warning)
✅ Passed checks (2 passed)
✏️ Tip: You can configure your own custom pre-merge checks in the settings. ✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
There was a problem hiding this comment.
Actionable comments posted: 4
🧹 Nitpick comments (2)
internal/cache/fingerprint_test.go (1)
60-63: Optional: assert the actual dirty format, not just length.Since the comment says “colon separator,” assert
strings.Contains(fp, ":")directly to lock the contract.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/cache/fingerprint_test.go` around lines 60 - 63, The test currently only checks fp length; also assert the expected colon separator by adding an assertion like strings.Contains(fp, ":") in the test that defines fp (fingerprint_test.go, inside the same test function where fp is produced); import the strings package if missing and use t.Errorf or t.Fatalf to fail the test with a clear message such as "expected dirty fingerprint to contain ':', got %q", fp so the contract is enforced.internal/analyze/handler.go (1)
31-36: Consider centralizing cache flow to avoid handler drift.
internal/analyze/handler.gonow has fingerprint fast path, butinternal/focus/handler.go:340-375still does zip-hash only. A shared helper would keep behavior/perf consistent across commands.🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/analyze/handler.go` around lines 31 - 36, The cache lookup/creation logic is duplicated: GetGraph in internal/analyze/handler.go implements a git-fingerprint fast path but internal/focus/handler.go still uses the older zip-hash-only flow; extract the cache flow into a shared helper (e.g., a function in an analyze/internal/cache package or exported in internal/analyze) that accepts (ctx context.Context, cfg *config.Config, dir string, force bool) and returns (*api.Graph, string, error), implementing the fingerprint-first check and zip-on-miss behavior; then replace the zip-hash-only logic in internal/focus/handler.go with a call to this new helper so both handlers share identical cache/fingerprint behavior and signatures (update callers to use the helper instead of their inline logic such as the current zip/hash code paths).
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/analyze/handler.go`:
- Around line 37-47: The code currently computes a repo fingerprint via
cache.RepoFingerprint(dir) and later recomputes it after zipping/uploading which
creates a TOCTOU risk; instead capture and reuse the initial fingerprint and the
derived cache.AnalysisKey (the 'fingerprint' and 'key' variables computed in the
fast-path block) for any subsequent cache operations and when storing the
resulting graph so you never call cache.RepoFingerprint(dir) a second time;
update the code paths that call cache.AnalysisKey or cache.Set/Get after
upload/analyze to reference the originally computed key and fingerprint (and
ensure any logging uses g.RepoID() alongside that same key) so the stored graph
and cache lookup use the identical identifier.
In `@internal/cache/fingerprint_test.go`:
- Around line 68-69: The tests call RepoFingerprint twice (and at other spots)
while discarding the returned error, which can hide failures; update each call
to RepoFingerprint in fingerprint_test.go (including the occurrences currently
at the locations referenced) to assert that err == nil immediately after the
call (e.g., using the test framework's require/Assert/if err != nil t.Fatalf),
so failures fail fast and do not allow empty/invalid fingerprints to be compared
as equal.
In `@internal/cache/fingerprint.go`:
- Around line 23-25: The current code returns commitSHA when running "git
status" fails, causing false clean fingerprints; instead propagate the error so
callers can fallback to zip-hash. In the function in
internal/cache/fingerprint.go where you check err after running git status (the
branch that currently does "if err != nil { return commitSHA, nil }"), replace
that return with returning an empty/zero fingerprint and the wrapped error
(e.g., return "", fmt.Errorf("git status failed: %w", err)) so callers receive
the failure; update any callers of this function to handle the returned error
path accordingly.
- Around line 22-37: The current fingerprint logic calls gitOutput(dir,
"status", "--porcelain", "--untracked-files=no") and therefore ignores untracked
files; change the git status call to use "--untracked-files=all" (i.e.
gitOutput(dir, "status", "--porcelain", "--untracked-files=all")), then inspect
the status output for lines beginning with "??" (untracked files) and, if any
are present, return a non-cacheable result (e.g. return an error to skip caching
or return commitSHA + ":dirty" per project convention) instead of treating the
repo as clean; keep the existing diff hashing path (gitOutput(dir, "diff",
"HEAD") and sha256 logic) for truly modified files.
---
Nitpick comments:
In `@internal/analyze/handler.go`:
- Around line 31-36: The cache lookup/creation logic is duplicated: GetGraph in
internal/analyze/handler.go implements a git-fingerprint fast path but
internal/focus/handler.go still uses the older zip-hash-only flow; extract the
cache flow into a shared helper (e.g., a function in an analyze/internal/cache
package or exported in internal/analyze) that accepts (ctx context.Context, cfg
*config.Config, dir string, force bool) and returns (*api.Graph, string, error),
implementing the fingerprint-first check and zip-on-miss behavior; then replace
the zip-hash-only logic in internal/focus/handler.go with a call to this new
helper so both handlers share identical cache/fingerprint behavior and
signatures (update callers to use the helper instead of their inline logic such
as the current zip/hash code paths).
In `@internal/cache/fingerprint_test.go`:
- Around line 60-63: The test currently only checks fp length; also assert the
expected colon separator by adding an assertion like strings.Contains(fp, ":")
in the test that defines fp (fingerprint_test.go, inside the same test function
where fp is produced); import the strings package if missing and use t.Errorf or
t.Fatalf to fail the test with a clear message such as "expected dirty
fingerprint to contain ':', got %q", fp so the contract is enforced.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: 8931ccb8-7700-4745-a0ec-58d60703a64c
📒 Files selected for processing (3)
internal/analyze/handler.gointernal/cache/fingerprint.gointernal/cache/fingerprint_test.go
| fp1, _ := RepoFingerprint(dir) | ||
| fp2, _ := RepoFingerprint(dir) |
There was a problem hiding this comment.
These tests should fail fast on fingerprint errors.
At Line 68, Line 69, Line 77, and Line 85, ignored errors can mask regressions (e.g., empty fingerprints compared as equal). Please assert err == nil in each call.
🔧 Suggested test hardening
- fp1, _ := RepoFingerprint(dir)
- fp2, _ := RepoFingerprint(dir)
+ fp1, err := RepoFingerprint(dir)
+ if err != nil {
+ t.Fatal(err)
+ }
+ fp2, err := RepoFingerprint(dir)
+ if err != nil {
+ t.Fatal(err)
+ }
if fp1 != fp2 {
t.Errorf("fingerprint should be stable: %q != %q", fp1, fp2)
}Also applies to: 77-77, 85-85
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/cache/fingerprint_test.go` around lines 68 - 69, The tests call
RepoFingerprint twice (and at other spots) while discarding the returned error,
which can hide failures; update each call to RepoFingerprint in
fingerprint_test.go (including the occurrences currently at the locations
referenced) to assert that err == nil immediately after the call (e.g., using
the test framework's require/Assert/if err != nil t.Fatalf), so failures fail
fast and do not allow empty/invalid fingerprints to be compared as equal.
| dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=no") | ||
| if err != nil { | ||
| return commitSHA, nil | ||
| } | ||
|
|
||
| if dirty == "" { | ||
| return commitSHA, nil | ||
| } | ||
|
|
||
| // Dirty: hash the diff to capture uncommitted changes. | ||
| diff, err := gitOutput(dir, "diff", "HEAD") | ||
| if err != nil { | ||
| return commitSHA + ":dirty", nil | ||
| } | ||
| h := sha256.Sum256([]byte(diff)) | ||
| return commitSHA + ":" + hex.EncodeToString(h[:8]), nil |
There was a problem hiding this comment.
🧩 Analysis chain
🏁 Script executed:
#!/bin/bash
set -euo pipefail
tmp="$(mktemp -d)"
cd "$tmp"
git init -q
git config user.email "test@test.com"
git config user.name "test"
echo 'package main' > main.go
git add main.go
git commit -q -m "init"
echo '--- baseline ---'
git status --porcelain --untracked-files=no | sed -n '1,5p'
git diff HEAD | wc -c
echo 'package main // new file' > new.go
echo '--- after adding untracked file ---'
echo '[status with --untracked-files=no]'
git status --porcelain --untracked-files=no | sed -n '1,5p'
echo '[status with --untracked-files=all]'
git status --porcelain --untracked-files=all | sed -n '1,5p'
echo '[git diff HEAD bytes]'
git diff HEAD | wc -cRepository: supermodeltools/cli
Length of output: 301
Untracked files bypass cache invalidation and cause stale results to be returned.
At lines 22 and 31–37, the code uses --untracked-files=no with git status and git diff HEAD to detect dirty repos. The test confirms this misses untracked files entirely: when you add a new untracked file like new.go, git status --porcelain --untracked-files=no returns empty—the file is invisible. Since git diff HEAD also ignores untracked files, the fingerprint stays as the plain commit SHA instead of switching to the :dirty variant. This means the cache will keep serving old analysis results even though the codebase has actually changed.
Concrete scenario: you commit code, run analysis (cached as commit SHA), then add a new untracked file, run analysis again—and get the stale cached result instead of re-analyzing.
The fix is straightforward: switch to --untracked-files=all and check for the ?? prefix (which marks untracked files), then return an error to skip caching.
Suggested fix
- dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=no")
+ dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=all")
if err != nil {
return commitSHA, nil
}
if dirty == "" {
return commitSHA, nil
}
+
+ // Conservative: if untracked files exist, skip fingerprint caching to avoid stale hits.
+ if strings.Contains(dirty, "?? ") {
+ return "", fmt.Errorf("untracked files present")
+ }📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=no") | |
| if err != nil { | |
| return commitSHA, nil | |
| } | |
| if dirty == "" { | |
| return commitSHA, nil | |
| } | |
| // Dirty: hash the diff to capture uncommitted changes. | |
| diff, err := gitOutput(dir, "diff", "HEAD") | |
| if err != nil { | |
| return commitSHA + ":dirty", nil | |
| } | |
| h := sha256.Sum256([]byte(diff)) | |
| return commitSHA + ":" + hex.EncodeToString(h[:8]), nil | |
| dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=all") | |
| if err != nil { | |
| return commitSHA, nil | |
| } | |
| if dirty == "" { | |
| return commitSHA, nil | |
| } | |
| // Conservative: if untracked files exist, skip fingerprint caching to avoid stale hits. | |
| if strings.Contains(dirty, "?? ") { | |
| return "", fmt.Errorf("untracked files present") | |
| } | |
| // Dirty: hash the diff to capture uncommitted changes. | |
| diff, err := gitOutput(dir, "diff", "HEAD") | |
| if err != nil { | |
| return commitSHA + ":dirty", nil | |
| } | |
| h := sha256.Sum256([]byte(diff)) | |
| return commitSHA + ":" + hex.EncodeToString(h[:8]), nil |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/cache/fingerprint.go` around lines 22 - 37, The current fingerprint
logic calls gitOutput(dir, "status", "--porcelain", "--untracked-files=no") and
therefore ignores untracked files; change the git status call to use
"--untracked-files=all" (i.e. gitOutput(dir, "status", "--porcelain",
"--untracked-files=all")), then inspect the status output for lines beginning
with "??" (untracked files) and, if any are present, return a non-cacheable
result (e.g. return an error to skip caching or return commitSHA + ":dirty" per
project convention) instead of treating the repo as clean; keep the existing
diff hashing path (gitOutput(dir, "diff", "HEAD") and sha256 logic) for truly
modified files.
| if err != nil { | ||
| return commitSHA, nil | ||
| } |
There was a problem hiding this comment.
Don’t treat git status failure as a clean fingerprint.
At Line 24, returning commitSHA on status failure can create false cache hits for unknown repo state. Prefer returning an error (or at least a dirty sentinel) so caller falls back to zip-hash safely.
✅ Minimal correctness fix
dirty, err := gitOutput(dir, "status", "--porcelain", "--untracked-files=all")
if err != nil {
- return commitSHA, nil
+ return "", fmt.Errorf("git status failed: %w", err)
}📝 Committable suggestion
‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.
| if err != nil { | |
| return commitSHA, nil | |
| } | |
| if err != nil { | |
| return "", fmt.Errorf("git status failed: %w", err) | |
| } |
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/cache/fingerprint.go` around lines 23 - 25, The current code returns
commitSHA when running "git status" fails, causing false clean fingerprints;
instead propagate the error so callers can fallback to zip-hash. In the function
in internal/cache/fingerprint.go where you check err after running git status
(the branch that currently does "if err != nil { return commitSHA, nil }"),
replace that return with returning an empty/zero fingerprint and the wrapped
error (e.g., return "", fmt.Errorf("git status failed: %w", err)) so callers
receive the failure; update any callers of this function to handle the returned
error path accordingly.
Add RepoFingerprint() that uses git rev-parse HEAD (~1ms) for clean repos and git diff hash (~100ms) for dirty repos. GetGraph() now checks the cache before creating a zip — cache hits skip zip creation entirely. Before: every command created a zip and hashed it (1-10s). After: cache hit in 35ms (1800x faster on supermodel-public-api). New functions: - cache.RepoFingerprint(dir) — fast repo state key - cache.AnalysisKey(fingerprint, type) — typed cache key 7 new tests for fingerprint (clean, dirty, stable, changes-after-commit, not-git, analysis key differentiation, stability). Partial supermodeltools#33.
08e026d to
b80e3c1
Compare
There was a problem hiding this comment.
Actionable comments posted: 1
♻️ Duplicate comments (1)
internal/analyze/handler.go (1)
79-86:⚠️ Potential issue | 🟠 MajorDon't recompute the fingerprint after the archive is frozen.
Line 80 calls
RepoFingerprintagain aftercreateZipandAnalyze. Example: archive state A, someone edits the repo during upload, Line 80 fingerprints state B, and now graph A is cached under key B. The next fingerprint hit can serve the wrong graph. Reuse the original fingerprint key here instead of recomputing it.🛠️ Suggested fix
- // Cache under both keys: fingerprint (fast lookup) and zip hash (fallback). - fingerprint, fpErr := cache.RepoFingerprint(dir) - if fpErr == nil { - fpKey := cache.AnalysisKey(fingerprint, "graph") - if err := cache.Put(fpKey, g); err != nil { - ui.Warn("could not write cache: %v", err) - } - } + // Cache under both keys: fingerprint (fast lookup) and zip hash (fallback). + if fpKey != "" { + if err := cache.Put(fpKey, g); err != nil { + ui.Warn("could not write cache: %v", err) + } + }🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed. In `@internal/analyze/handler.go` around lines 79 - 86, The code recomputes the repo fingerprint after creating the archive and analyzing it, which can mismatch the archived state; instead, reuse the original fingerprint computed before createZip/Analyze (or compute it once before freezing) when building fpKey with cache.AnalysisKey and calling cache.Put, i.e., ensure the fpKey uses the pre-freeze fingerprint variable (the same value used to identify the archive) rather than calling RepoFingerprint again.
🤖 Prompt for all review comments with AI agents
Verify each finding against the current code and only fix it if needed.
Inline comments:
In `@internal/analyze/handler.go`:
- Around line 37-47: The code returns different cache keys for the same repo
state because the legacy zip-hash path and the fingerprint fast-path produce
different keys; update GetGraph so it always returns one canonical key based on
RepoFingerprint when available: when RepoFingerprint(dir) succeeds compute fpKey
:= cache.AnalysisKey(fingerprint, "graph") and use/return that key; if you hit a
graph under the legacy zip/hash key, backfill the fingerprint entry by storing
the same graph under fpKey (e.g. cache.Set(fpKey, graph) or the equivalent cache
API) before returning so subsequent calls return the canonical fpKey. Ensure
references: RepoFingerprint, AnalysisKey, GetGraph, and the cache get/set calls
are updated accordingly.
---
Duplicate comments:
In `@internal/analyze/handler.go`:
- Around line 79-86: The code recomputes the repo fingerprint after creating the
archive and analyzing it, which can mismatch the archived state; instead, reuse
the original fingerprint computed before createZip/Analyze (or compute it once
before freezing) when building fpKey with cache.AnalysisKey and calling
cache.Put, i.e., ensure the fpKey uses the pre-freeze fingerprint variable (the
same value used to identify the archive) rather than calling RepoFingerprint
again.
🪄 Autofix (Beta)
Fix all unresolved CodeRabbit comments on this PR:
- Push a commit to this branch (recommended)
- Create a new PR with the fixes
ℹ️ Review info
⚙️ Run configuration
Configuration used: Organization UI
Review profile: CHILL
Plan: Pro
Run ID: e63804fb-b00f-4357-910a-0d70db8d0ccd
📒 Files selected for processing (3)
internal/analyze/handler.gointernal/cache/fingerprint.gointernal/cache/fingerprint_test.go
✅ Files skipped from review due to trivial changes (1)
- internal/cache/fingerprint_test.go
🚧 Files skipped from review as they are similar to previous changes (1)
- internal/cache/fingerprint.go
| // Fast path: check cache using git fingerprint before creating zip. | ||
| if !force { | ||
| fingerprint, err := cache.RepoFingerprint(dir) | ||
| if err == nil { | ||
| key := cache.AnalysisKey(fingerprint, "graph") | ||
| if g, _ := cache.Get(key); g != nil { | ||
| ui.Success("Using cached analysis (repoId: %s)", g.RepoID()) | ||
| return g, key, nil | ||
| } | ||
| } | ||
| } |
There was a problem hiding this comment.
Return one canonical key for the same repo state.
Simple repro: on a cold cache, the first call returns hash at Line 92; the second call on the unchanged repo returns fpKey from Lines 41-44. That breaks the current GetGraph contract in internal/analyze/integration_test.go:64-65, which expects both calls to yield the same key. It also means a legacy zip-hash hit at Lines 65-67 never gets upgraded to the fast path because the fingerprint entry is never backfilled.
💡 Suggested shape
func GetGraph(ctx context.Context, cfg *config.Config, dir string, force bool) (*api.Graph, string, error) {
- // Fast path: check cache using git fingerprint before creating zip.
- if !force {
- fingerprint, err := cache.RepoFingerprint(dir)
- if err == nil {
- key := cache.AnalysisKey(fingerprint, "graph")
- if g, _ := cache.Get(key); g != nil {
- ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
- return g, key, nil
- }
- }
- }
+ var fpKey string
+ if fingerprint, err := cache.RepoFingerprint(dir); err == nil {
+ fpKey = cache.AnalysisKey(fingerprint, "graph")
+ if !force {
+ if g, _ := cache.Get(fpKey); g != nil {
+ ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
+ return g, fpKey, nil
+ }
+ }
+ }
@@
if !force {
if g, _ := cache.Get(hash); g != nil {
+ if fpKey != "" {
+ if err := cache.Put(fpKey, g); err != nil {
+ ui.Warn("could not write cache: %v", err)
+ }
+ ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
+ return g, fpKey, nil
+ }
ui.Success("Using cached analysis (repoId: %s)", g.RepoID())
return g, hash, nil
}
}
@@
- return g, hash, nil
+ if fpKey != "" {
+ return g, fpKey, nil
+ }
+ return g, hash, nil
}Also applies to: 63-68, 91-92
🤖 Prompt for AI Agents
Verify each finding against the current code and only fix it if needed.
In `@internal/analyze/handler.go` around lines 37 - 47, The code returns different
cache keys for the same repo state because the legacy zip-hash path and the
fingerprint fast-path produce different keys; update GetGraph so it always
returns one canonical key based on RepoFingerprint when available: when
RepoFingerprint(dir) succeeds compute fpKey := cache.AnalysisKey(fingerprint,
"graph") and use/return that key; if you hit a graph under the legacy zip/hash
key, backfill the fingerprint entry by storing the same graph under fpKey (e.g.
cache.Set(fpKey, graph) or the equivalent cache API) before returning so
subsequent calls return the canonical fpKey. Ensure references: RepoFingerprint,
AnalysisKey, GetGraph, and the cache get/set calls are updated accordingly.
Partial #33.
Problem
Every command created a zip and SHA256'd it before checking the cache. On supermodel-public-api, this took 1-10 seconds — even when the result was already cached.
Fix
Use
git rev-parse HEAD(~1ms) to check the cache before creating a zip. Only create and upload the zip on a cache miss.Performance
1800x faster cache hits on supermodel-public-api.
How it works
cache.RepoFingerprint(dir)— returns commit SHA for clean repos (~1ms), commitSHA:diffHash for dirty repos (~100ms)cache.AnalysisKey(fingerprint, "graph")— hashes fingerprint + analysis type into a cache keyanalyze.GetGraph()checks fingerprint cache first, falls back to zip-hash cache, only uploads on full missChanges
internal/cache/fingerprint.go— newRepoFingerprint(),AnalysisKey()internal/cache/fingerprint_test.go— 7 tests (clean, dirty, stable, commit changes, not-git, key types)internal/analyze/handler.go—GetGraph()uses fingerprint fast pathTest plan
go test ./...— all 14 packages passsupermodel analyze --force(63s) thensupermodel analyze(0.035s)Summary by CodeRabbit
New Features
Tests