-
-
Notifications
You must be signed in to change notification settings - Fork 1.8k
split: stream data using copy to not overwhelm memory with large files #10251
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
split: stream data using copy to not overwhelm memory with large files #10251
Conversation
Merging this PR will improve performance by ×4.9
Performance Changes
Comparing Footnotes
|
|
@ChrisDryden what is your take on the perf regression ? |
|
I don't think the perf regression is acceptable, right now io::copy uses a 8kb buffer which is causing the regression. I see head uses a 64kb buffer instead, I'll try seeing if the increased buffer size will mitigate the perf regression |
|
GNU testsuite comparison: |
|
Much better now, solved the original issue and there's huge memory and performance improvements. The original fix had a 8kb default buffer and testing locally I found it hit diminishing returns at 128kb. |
|
GNU testsuite comparison: |
7d5b0c0 to
3d2d0ba
Compare
|
GNU testsuite comparison: |
|
GNU testsuite comparison: |
This should solve the issue: #10250 where for large files split can run out of memory. I was able to use the built in integration test support to limit the resources that the test has to mock a scenario where the file is larger than the available memory to trigger the conditions for the bug.