fix: copy bech32 cache key to prevent SIGSEGV from mmap'd zero-copy data#3106
Draft
masih wants to merge 1 commit intorelease/v6.4from
Draft
fix: copy bech32 cache key to prevent SIGSEGV from mmap'd zero-copy data#3106masih wants to merge 1 commit intorelease/v6.4from
masih wants to merge 1 commit intorelease/v6.4from
Conversation
cacheBech32Addr stored the caller-provided cacheKey (created via UnsafeBytesToStr) directly in the global LRU cache. UnsafeBytesToStr produces a string that shares backing memory with the original []byte slice. When that slice originates from a MemIAVL zero-copy read, its backing memory is mmap'd from a snapshot file on disk. During snapshot rotation, Tree.ReplaceWith() closes the old snapshot which calls munmap(), unmapping the memory region. Any LRU cache keys still pointing into that region become dangling pointers. The next map lookup in the LRU triggers memeqbody on unmapped memory, causing a SIGSEGV that halts the node. Fix: use string(addr) instead of cacheKey when inserting into the cache. string([]byte) always allocates a heap-backed copy, so the cache key survives independently of the original slice's lifetime. Performance impact: zero on the hot path (cache hits still use the zero-alloc UnsafeBytesToStr for lookup). On cache misses, one extra 20-byte allocation is negligible next to the bech32 encoding already performed. Co-authored-by: Masih H. Derkani <m@derkani.org>
|
The latest Buf updates on your PR. Results from workflow Buf / buf (pull_request).
|
Codecov Report✅ All modified and coverable lines are covered by tests. Additional details and impacted files@@ Coverage Diff @@
## release/v6.4 #3106 +/- ##
=============================================
Coverage 58.39% 58.39%
=============================================
Files 2088 2088
Lines 172135 172135
=============================================
Hits 100525 100525
Misses 62647 62647
Partials 8963 8963
Flags with carried forward coverage won't be shown. Click here to find out more.
🚀 New features to boost your workflow:
|
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
Node actic-1 halted at block 149060323 with a SIGSEGV in goroutine 334 (CheckTx):
The crash occurred in
memeqbodyduring a Go map lookup inside theaccAddrCacheLRU, called fromAccAddress.String()in the EVM fee charging path:Root Cause
cacheBech32Addrstored the caller-providedcacheKey(created viaUnsafeBytesToStr) directly as a map key in the globalaccAddrCacheLRU.UnsafeBytesToStrproduces a string that shares backing memory with the original[]byteslice.When that slice originates from a MemIAVL zero-copy read (which is the default —
ZeroCopy: true), its backing memory ismmap'd from a snapshot file on disk. During snapshot rotation,Tree.ReplaceWith()closes the old snapshot viasnapshot.Close()→kvsMap.Close()→mmap.Munmap(), unmapping the memory region.Any LRU cache keys still pointing into that region become dangling pointers. The next map lookup in the LRU triggers
memeqbodyon unmapped memory → SIGSEGV.The fault address (
0x7fd4774efb0d) and the lookup key data pointer (0x7fd40dc23b0d) are both in the Linux mmap region, not the Go heap (0xc0...), confirming this is unmapped memory-mapped file data.Fix
Use
string(addr)instead ofcacheKeywhen inserting into the LRU cache.string([]byte)always allocates a heap-backed copy, so the cache key survives independently of the original slice's lifetime.Performance Impact
Zero on the hot path. Cache hits (99%+ of calls) still use the zero-allocation
UnsafeBytesToStrfor lookup. On cache misses, one extra 20-byte allocation is negligible next to the bech32 encoding already performed on that path.Scope
The fix is in
cacheBech32Addr, which is the single function used by all three address caches (accAddrCache,valAddrCache,consAddrCache), so all three are protected by this one-line change.