split: stream data using copy to not overwhelm memory with large files #10251

ChrisDryden · 2026-01-14T23:41:53Z

This should solve the issue: #10250 where for large files split can run out of memory. I was able to use the built in integration test support to limit the resources that the test has to mock a scenario where the file is larger than the available memory to trigger the conditions for the bug.

codspeed-hq · 2026-01-14T23:56:04Z

Merging this PR will improve performance by ×4.9

⚡ 1 improved benchmark
✅ 283 untouched benchmarks
⏩ 38 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	Memory	`split_number_chunks`	1,127.9 KB	231.9 KB	×4.9

_{Comparing ChrisDryden:fix-split-number-memory-issue (e4e9bdf) with main (ec7e81e)}

38 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

sylvestre · 2026-01-18T09:36:00Z

@ChrisDryden what is your take on the perf regression ?

ChrisDryden · 2026-01-19T09:01:04Z

I don't think the perf regression is acceptable, right now io::copy uses a 8kb buffer which is causing the regression. I see head uses a 64kb buffer instead, I'll try seeing if the increased buffer size will mitigate the perf regression

github-actions · 2026-01-19T09:22:43Z

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

ChrisDryden · 2026-01-19T10:17:05Z

Much better now, solved the original issue and there's huge memory and performance improvements. The original fix had a 8kb default buffer and testing locally I found it hit diminishing returns at 128kb.

github-actions · 2026-01-26T12:27:40Z

GNU testsuite comparison:

Skipping an intermittent issue tests/shuf/shuf-reservoir (passes in this run but fails in the 'main' branch)
Skipping an intermittent issue tests/sort/sort-stale-thread-mem (passes in this run but fails in the 'main' branch)

…to memory

…tforms

github-actions · 2026-02-10T03:56:09Z

GNU testsuite comparison:

Congrats! The gnu test tests/pr/bounded-memory is no longer failing!

github-actions · 2026-02-11T12:15:15Z

GNU testsuite comparison:

GNU test failed: tests/pr/bounded-memory. tests/pr/bounded-memory is passing on 'main'. Maybe you have to rebase?

ChrisDryden added 3 commits February 10, 2026 03:40

split: stream data in --number mode to avoid loading entire chunks in…

390ad77

…to memory

split: increase memory limit in streaming test for cross-compiled pla…

57e7c10

…tforms

Use 128 KiB buffer for streaming copy to fix performance regression

3d2d0ba

ChrisDryden force-pushed the fix-split-number-memory-issue branch from 7d5b0c0 to 3d2d0ba Compare February 10, 2026 03:40

Merge branch 'main' into fix-split-number-memory-issue

e4e9bdf

ChrisDryden mentioned this pull request Feb 12, 2026

split: flush BufWriter before replacing to detect write errors #10892

Closed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

split: stream data using copy to not overwhelm memory with large files #10251

split: stream data using copy to not overwhelm memory with large files #10251

Uh oh!

ChrisDryden commented Jan 14, 2026

Uh oh!

codspeed-hq bot commented Jan 14, 2026 •

edited

Loading

Uh oh!

sylvestre commented Jan 18, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Uh oh!

split: stream data using copy to not overwhelm memory with large files #10251

Are you sure you want to change the base?

split: stream data using copy to not overwhelm memory with large files #10251

Uh oh!

Conversation

ChrisDryden commented Jan 14, 2026

Uh oh!

codspeed-hq bot commented Jan 14, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Merging this PR will improve performance by ×4.9

Performance Changes

Footnotes

Uh oh!

sylvestre commented Jan 18, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 19, 2026

Uh oh!

ChrisDryden commented Jan 19, 2026

Uh oh!

github-actions bot commented Jan 26, 2026

Uh oh!

github-actions bot commented Feb 10, 2026

Uh oh!

github-actions bot commented Feb 11, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

codspeed-hq bot commented Jan 14, 2026 •

edited

Loading