the two PRs #3715 and #3719 demonstrate that we can attain substantial performance improvements and completely new functionality by applying a series of changes to stores, codecs, and the codec pipeline logic. We get the performance via the following changes:
- Avoid async overhead when doing IO against low-latency stores
- Avoid overhead related to creating unnecessary python objects per-chunk
- Formally separate IO from compute in chunk encoding / decoding
- Use range writes when applicable to write individual subchunks inside a shard.
The two PRs that demonstrate these changes are far too large to merge on their own, and reviewing is burdened with off-target changes made inadvertantly by claude.
I think the performance wins are too great to pass up, and the cost is manageable if we break the changes into smaller PRs. This issue tracks the progress of these changes, and can serve as a discussion site as needed.
Here's an outline of the PRs I'd like to open shortly:
I will update this issue when I actually open these PRs.
the two PRs #3715 and #3719 demonstrate that we can attain substantial performance improvements and completely new functionality by applying a series of changes to stores, codecs, and the codec pipeline logic. We get the performance via the following changes:
The two PRs that demonstrate these changes are far too large to merge on their own, and reviewing is burdened with off-target changes made inadvertantly by claude.
I think the performance wins are too great to pass up, and the cost is manageable if we break the changes into smaller PRs. This issue tracks the progress of these changes, and can serve as a discussion site as needed.
Here's an outline of the PRs I'd like to open shortly:
I will update this issue when I actually open these PRs.