Skip to content

[SPARK-56319] ShuffleExternalSorter should reuse write buffer#55136

Open
leixm wants to merge 1 commit intoapache:masterfrom
leixm:SPARK-56319
Open

[SPARK-56319] ShuffleExternalSorter should reuse write buffer#55136
leixm wants to merge 1 commit intoapache:masterfrom
leixm:SPARK-56319

Conversation

@leixm
Copy link
Copy Markdown
Contributor

@leixm leixm commented Apr 1, 2026

What changes were proposed in this pull request?

[SPARK-56319] ShuffleExternalSorter should reuse write buffer

Why are the changes needed?

In ShuffleExternalSorter, the writeSortedFile() method allocates a new byte[] of diskWriteBufferSize (default 1 MB) on every invocation. This method is called once per spill and once more for the final file write, so in a spill-heavy shuffle task the same 1 MB array is allocated and discarded repeatedly, adding unnecessary GC pressure.

This PR promotes the write buffer to a lazily-initialized instance field so that it is allocated on the first call to writeSortedFile() and reused across subsequent spills. Lazy initialization ensures that the buffer is not allocated until it is actually needed, so tasks that insert no records (and therefore never call into the write path) pay no extra cost.

Does this PR introduce any user-facing change?

No.

How was this patch tested?

Existing UTs.

Was this patch authored or co-authored using generative AI tooling?

No.

@leixm
Copy link
Copy Markdown
Contributor Author

leixm commented Apr 1, 2026

@cloud-fan Can you help review?

@leixm
Copy link
Copy Markdown
Contributor Author

leixm commented Apr 2, 2026

cc @LuciferYang Can you help review this PR?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant