Skip to content

[common] Deep copy BinaryString in createDeepFieldGetter to fix use-after-free#3045

Closed
YannByron wants to merge 1 commit intoapache:mainfrom
YannByron:fix-flaky-remote-log-scanner
Closed

[common] Deep copy BinaryString in createDeepFieldGetter to fix use-after-free#3045
YannByron wants to merge 1 commit intoapache:mainfrom
YannByron:fix-flaky-remote-log-scanner

Conversation

@YannByron
Copy link
Copy Markdown
Contributor

Summary

  • Fix flaky test RemoteLogScannerITCase#testScanFromRemoteAndProject (Issue [test] Unstable test RemoteLogScannerITCase.testScanFromRemoteAndProject #2992)
  • IndexedRow.getString() returns a zero-copy BinaryString view into the underlying MemorySegment. When DefaultCompletedFetch.drain() releases the backing Netty ByteBuf, these views become dangling references, causing data corruption on platforms that eagerly reclaim freed memory (e.g., Linux CI)
  • Add STRING case to createDeepFieldGetter and createDeepElementGetter that calls BinaryString.copy() to materialize string data into independent heap memory

Test plan

  • RemoteLogScannerITCase#testScanFromRemoteAndProject passes consistently (3/3 runs)
  • All 497 record/row unit tests pass with zero regressions

…fter-free

IndexedRow.getString() returns a zero-copy BinaryString view into the
underlying MemorySegment. When DefaultCompletedFetch.drain() releases
the backing Netty ByteBuf, these BinaryString views become dangling
references, causing data corruption on platforms that eagerly reclaim
freed memory (e.g., Linux CI).

This fixes the flaky test RemoteLogScannerITCase#testScanFromRemoteAndProject
by adding a STRING case to createDeepFieldGetter and createDeepElementGetter
that calls BinaryString.copy() to materialize the string data into
independent heap memory.

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
luoyuxia

This comment was marked as outdated.

* array/map/row types.
*
* <p>NOTE: Currently, it is only used for deep copying {@link ColumnarRow} for Arrow which
* avoid the arrow buffer is released before accessing elements. It doesn't deep copy STRING and
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sorry for miss this part. From the comments, seems STRING is not need to copy. And I'm also afraid copy string will cause performance degrade.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this might be a better approach: #3008

@YannByron
Copy link
Copy Markdown
Contributor Author

Thanks for the review @luoyuxia and @fresh-borzoni!

@luoyuxia #3008 has a more detailed explanation on why STRING still needs to be copied for INDEXED/COMPACTED formats.

@fresh-borzoni Thanks for pointing out #3008! Your approach with the copyStrings flag is indeed more precise — it correctly avoids the copy for Arrow (which already copies strings) and
also handles CHAR. I'm happy to close this PR in favor of #3008.

@YannByron YannByron closed this Apr 10, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants