EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer#535
EmergePlanner: Optimize _get_emerge_parts() for better grouping of parts in the upload buffer#535LewyMaster wants to merge 1 commit intoBackblaze:masterfrom
Conversation
…rts in the upload buffer
|
we were discussing back then splitting implementations for streaming and "full knowledge" planner - then we would be able to plan better in such edge cases - but this is only for small parts - so we decided to keep single implementation and leave such edge cases suboptimal - in most applications parts will be larger than 5MB and there is no problem - if there is valid usecase then I think dedicated implementation of planner should be produced for "full knowledge" case - modifying streaming logic is nightmare :) |
ppolewicz
left a comment
There was a problem hiding this comment.
I didn't read it in detail, because it doesn't have a single test yet. The emerger planner has very nice tests, if I remember correctly - could you please add one showing the case described in task description, make sure that it does what it should do?
| current_intent, | ||
| start_offset=upload_buffer.end_offset, | ||
| end_offset=current_end, | ||
| end_offset=current_end |
There was a problem hiding this comment.
Please use trailing commas on multi-line statements so that if someone wants to add an argument, the diff consists of just one "added" line and not 2 added 1 removed
| upload_buffer = UploadBuffer(current_end) | ||
| else: | ||
| upload_buffer = UploadBuffer(current_end) |
There was a problem hiding this comment.
| upload_buffer = UploadBuffer(current_end) | |
| else: | |
| upload_buffer = UploadBuffer(current_end) | |
| upload_buffer = UploadBuffer(current_end) |
| # we "borrow" a fragment of current intent to upload buffer | ||
| # to fill it to minimum part size |
There was a problem hiding this comment.
| # we "borrow" a fragment of current intent to upload buffer | |
| # to fill it to minimum part size | |
| # we "borrow" a fragment of current intent to upload buffer | |
| # to fill it to minimum part size |
EmergePlanner optimization solution as a task for job recruiting process.
Task:
`implementation of
b2-sdk-python/b2sdk/transfer/emerge/planner/planner.py
Line 191 in 10a5e72
is a little bit naiive, because (with 5MB minimum part size) if we mark
"u" as a megabyte of data to upload
"c" as a megabyte of data to copy
"d" as a megabyte of data that will be downloaded and uploaded
then for "input", "master" pattern will be executed by the current version of the code while ideally in such case the "ideal" pattern would be used:
input: uuu cccccccccc uu ccccc ...
master: uuu ddcccccccc uu ddddd ...
ideal: uuu ddcccccddd uu ccccc ...
This is because Planner is only considering to bundle an upload with a part of a copy from the right side of the upload and not also from the left. Appropriately to the quality of the code in this project, tests should be provided and the code of the planner should remain readable. The behavior of the planner should strictly improve and the changes done to it should not have any negative impact on any input case (beyond marginally higher CPU consumption during planning).
`