fix(flatkv): purge stale data before state sync import#3090
fix(flatkv): purge stale data before state sync import#3090blindchaser merged 6 commits intomainfrom
Conversation
Importer() previously reopened the existing store and wrote snapshot data via upsert. Keys that existed locally but were deleted in the remote snapshot survived the import, producing a mixed stale state. Add resetForImport() which removes all snapshots, the working directory, and the WAL before reopening from an empty store. This ensures the imported state exactly matches the remote snapshot. Also re-acquire the file lock before mutating the data directory since rootmulti.Restore() releases it via Close() before calling Importer().
bba608b to
c2f0789
Compare
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
There was a problem hiding this comment.
💡 Codex Review
Here are some automated review suggestions for this pull request.
Reviewed commit: c2f07892ab
ℹ️ About Codex in GitHub
Your team has set up Codex to review pull requests in this repo. Reviews are triggered when you
- Open a pull request for review
- Mark a draft as ready
- Comment "@codex review".
If Codex has suggestions, it will comment; otherwise it will react with 👍.
Codex can also answer questions or update the PR. Try commenting "@codex address that feedback".
| if err := s.resetForImport(); err != nil { | ||
| return nil, fmt.Errorf("reset store for import: %w", err) |
There was a problem hiding this comment.
Make import reset atomic with respect to restore failures
Importer() now purges the on-disk store before the first snapshot item is read. In sei-cosmos/storev2/rootmulti/store.go's restore loop, any decode/validation failure sets restoreErr but still calls scImporter.Close(), which finalizes whatever nodes were imported so far. That means an interrupted or corrupt state-sync stream now deletes the previously valid FlatKV state and replaces it with a partial (or even empty) snapshot at height, instead of failing without destroying the old store.
Useful? React with 👍 / 👎.
sei-db/state_db/sc/flatkv/store.go
Outdated
| logRemoveErr := func(path string, err error) { | ||
| if err != nil { | ||
| logger.Error("resetForImport: failed to remove", "path", path, "err", err) | ||
| } |
There was a problem hiding this comment.
Fail the import if stale snapshots or WAL cannot be removed
The new purge path only logs removal errors and continues. If working/, any snapshot-*, or changelog/ cannot be deleted (for example due to permissions or a lingering external handle), resetForImport still reopens the DB and reports success. In that case open() can import on top of surviving snapshot data, and the next LoadVersion() will replay the old WAL in catchup(), recreating exactly the mixed pre-sync state this patch is meant to prevent.
Useful? React with 👍 / 👎.
Codecov Report❌ Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## main #3090 +/- ##
==========================================
+ Coverage 58.59% 58.73% +0.13%
==========================================
Files 2096 2094 -2
Lines 173372 172906 -466
==========================================
- Hits 101583 101551 -32
+ Misses 62746 62384 -362
+ Partials 9043 8971 -72
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
sei-db/state_db/sc/flatkv/store.go
Outdated
| if err != nil { | ||
| logger.Error("resetForImport: failed to remove", "path", path, "err", err) |
There was a problem hiding this comment.
If failure to remove data can result in a corrupted final state, then I think it's better to return an error and stop the sync. IMO, an offline machine is better than an online one with high probability of corruption.
There was a problem hiding this comment.
makes sense, updated
Summary
FlatKV state sync import previously only upserted snapshot key-value pairs into the existing store. Keys that existed locally but were absent from the remote snapshot survived the import, producing a mixed state with stale data. This PR introduces a full data purge before import so the resulting store exactly matches the snapshot.
sei-db/state_db/sc/flatkv/store.go:Importer()now callsresetForImport()instead of conditionally reopening a closed store.resetForImportcloses all DB handles, re-acquires the file lock (released byrootmulti.Restore's earlierClose()), then removes the working directory, all snapshot directories, thecurrentsymlink, and the changelog WAL. It reopens from a pristine empty state and resets in-memory LtHash/version to zero. Deletion failures are logged but do not abort the import.closeDBsOnlyerrors are now propagated instead of silently discarded, matching the error-handling pattern inRollback.Test plan
sei-db/state_db/sc/flatkv/exporter_test.go:TestImportPurgesStaleData: Populates a store with keys across all four underlying PebbleDBs (storage, account/nonce, account/codehash, code) including stale keys for a third address. Builds a snapshot containing only two of the three addresses. Imports the snapshot into the pre-populated store and verifies: (1) stale keys from every DB type are absent, (2) active keys carry the new snapshot values, (3) LtHash matches the source, (4) the store survives a close/reopen cycle with stale keys still absent and LtHash intact.