Skip to content

fix(flatkv): purge stale data before state sync import#3090

Merged
blindchaser merged 6 commits intomainfrom
yiren/state-sync-stale
Mar 23, 2026
Merged

fix(flatkv): purge stale data before state sync import#3090
blindchaser merged 6 commits intomainfrom
yiren/state-sync-stale

Conversation

@blindchaser
Copy link
Contributor

Summary

FlatKV state sync import previously only upserted snapshot key-value pairs into the existing store. Keys that existed locally but were absent from the remote snapshot survived the import, producing a mixed state with stale data. This PR introduces a full data purge before import so the resulting store exactly matches the snapshot.

  • sei-db/state_db/sc/flatkv/store.go: Importer() now calls resetForImport() instead of conditionally reopening a closed store. resetForImport closes all DB handles, re-acquires the file lock (released by rootmulti.Restore's earlier Close()), then removes the working directory, all snapshot directories, the current symlink, and the changelog WAL. It reopens from a pristine empty state and resets in-memory LtHash/version to zero. Deletion failures are logged but do not abort the import. closeDBsOnly errors are now propagated instead of silently discarded, matching the error-handling pattern in Rollback.

Test plan

  • sei-db/state_db/sc/flatkv/exporter_test.go:
    • TestImportPurgesStaleData: Populates a store with keys across all four underlying PebbleDBs (storage, account/nonce, account/codehash, code) including stale keys for a third address. Builds a snapshot containing only two of the three addresses. Imports the snapshot into the pre-populated store and verifies: (1) stale keys from every DB type are absent, (2) active keys carry the new snapshot values, (3) LtHash matches the source, (4) the store survives a close/reopen cycle with stale keys still absent and LtHash intact.

Importer() previously reopened the existing store and wrote snapshot
data via upsert. Keys that existed locally but were deleted in the
remote snapshot survived the import, producing a mixed stale state.

Add resetForImport() which removes all snapshots, the working
directory, and the WAL before reopening from an empty store. This
ensures the imported state exactly matches the remote snapshot.
Also re-acquire the file lock before mutating the data directory
since rootmulti.Restore() releases it via Close() before calling
Importer().
@blindchaser blindchaser force-pushed the yiren/state-sync-stale branch from bba608b to c2f0789 Compare March 19, 2026 20:34
@github-actions
Copy link

github-actions bot commented Mar 19, 2026

The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).

BuildFormatLintBreakingUpdated (UTC)
✅ passed✅ passed✅ passed✅ passedMar 23, 2026, 6:06 PM

Copy link

@chatgpt-codex-connector chatgpt-codex-connector bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codex Review

Here are some automated review suggestions for this pull request.

Reviewed commit: c2f07892ab

ℹ️ About Codex in GitHub

Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you

  • Open a pull request for review
  • Mark a draft as ready
  • Comment "@codex review".

If Codex has suggestions, it will comment; otherwise it will react with 👍.

Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".

Comment on lines +571 to +572
if err := s.resetForImport(); err != nil {
return nil, fmt.Errorf("reset store for import: %w", err)

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Badge Make import reset atomic with respect to restore failures

Importer() now purges the on-disk store before the first snapshot item is read. In sei-cosmos/storev2/rootmulti/store.go's restore loop, any decode/validation failure sets restoreErr but still calls scImporter.Close(), which finalizes whatever nodes were imported so far. That means an interrupted or corrupt state-sync stream now deletes the previously valid FlatKV state and replaces it with a partial (or even empty) snapshot at height, instead of failing without destroying the old store.

Useful? React with 👍 / 👎.

Comment on lines +600 to +603
logRemoveErr := func(path string, err error) {
if err != nil {
logger.Error("resetForImport: failed to remove", "path", path, "err", err)
}

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 Badge Fail the import if stale snapshots or WAL cannot be removed

The new purge path only logs removal errors and continues. If working/, any snapshot-*, or changelog/ cannot be deleted (for example due to permissions or a lingering external handle), resetForImport still reopens the DB and reports success. In that case open() can import on top of surviving snapshot data, and the next LoadVersion() will replay the old WAL in catchup(), recreating exactly the mixed pre-sync state this patch is meant to prevent.

Useful? React with 👍 / 👎.

@codecov
Copy link

codecov bot commented Mar 19, 2026

Codecov Report

❌ Patch coverage is 48.38710% with 16 lines in your changes missing coverage. Please review.
✅ Project coverage is 58.73%. Comparing base (9dc8d6e) to head (1f62782).
⚠️ Report is 1 commits behind head on main.

Files with missing lines Patch % Lines
sei-db/state_db/sc/flatkv/store.go 48.38% 9 Missing and 7 partials ⚠️
Additional details and impacted files

Impacted file tree graph

@@            Coverage Diff             @@
##             main    #3090      +/-   ##
==========================================
+ Coverage   58.59%   58.73%   +0.13%     
==========================================
  Files        2096     2094       -2     
  Lines      173372   172906     -466     
==========================================
- Hits       101583   101551      -32     
+ Misses      62746    62384     -362     
+ Partials     9043     8971      -72     
Flag Coverage Δ
sei-chain-pr 68.24% <48.38%> (?)
sei-db 70.41% <ø> (ø)

Flags with carried forward coverage won't be shown. Click here to find out more.

Files with missing lines Coverage Δ
sei-db/state_db/sc/flatkv/store.go 70.40% <48.38%> (-2.15%) ⬇️

... and 52 files with indirect coverage changes

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.
  • 📦 JS Bundle Analysis: Save yourself from yourself by tracking and limiting bundle sizes in JS merges.

Comment on lines +601 to +602
if err != nil {
logger.Error("resetForImport: failed to remove", "path", path, "err", err)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If failure to remove data can result in a corrupted final state, then I think it's better to return an error and stop the sync. IMO, an offline machine is better than an online one with high probability of corruption.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

makes sense, updated

@blindchaser blindchaser added this pull request to the merge queue Mar 23, 2026
Merged via the queue into main with commit 491d2c6 Mar 23, 2026
38 checks passed
@blindchaser blindchaser deleted the yiren/state-sync-stale branch March 23, 2026 20:56
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants