Skip to content

fix: implement separate cache for byte-range-requests#3710

Open
d-v-b wants to merge 2 commits intozarr-developers:mainfrom
d-v-b:fix/cache-store-byte-range
Open

fix: implement separate cache for byte-range-requests#3710
d-v-b wants to merge 2 commits intozarr-developers:mainfrom
d-v-b:fix/cache-store-byte-range

Conversation

@d-v-b
Copy link
Contributor

@d-v-b d-v-b commented Feb 15, 2026

This PR adds a support for byte-range caching to the experimental CacheStore. Claude did all the heavy lifting here so we need to review carefully.

Some design notes:

The dual-store design of the CacheStore complicates caching range requests. I considered 3 options:

  • Don't cache range requests at all. Cache the entire object, then do byte-range reads against that object. Major downside of this is the cost of fetching the entire object, which could defeat the purpose of the range request in the first place.
  • Cache range requests in the regular caching layer, using a key that's a unique stringification of the byte range request, like f"{key}.{request type}.{request params}". Downside of this is potentially littering the cache (which might be the local file system) with non-zarr keys.
  • Use a separate in-memory cache specifically for byte ranges. This is the option I took. It's simple, but a downside is that sharded data won't end up stored in the primary cache unless the user explicitly fetches an entire shard.

I would love to hear other ideas / suggestions. Maybe we support all 3 options via a parameter on the cache store?

@github-actions github-actions bot added the needs release notes Automatically applied to PRs which haven't added release notes label Feb 15, 2026
@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 15, 2026

closes #3690

@d-v-b d-v-b requested a review from maxrjones February 16, 2026 09:45
@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 16, 2026

@dsparber take a look and let me know if this works for your needs

@dsparber
Copy link

Looks great, thanks a lot! The double cache approach seems like a good trade-off.

One small concern (maybe I overlooked it): Is there a way to limit the size of the in-memory cache for ranges?
Just to avoid an unbounded RAM usage, when making many queries to the zarr.

@d-v-b
Copy link
Contributor Author

d-v-b commented Feb 16, 2026

Looks great, thanks a lot! The double cache approach seems like a good trade-off.

One small concern (maybe I overlooked it): Is there a way to limit the size of the in-memory cache for ranges? Just to avoid an unbounded RAM usage, when making many queries to the zarr.

Ranges and full fetches both contribute to the cache size limit. So there shouldn't be a risk unbounded RAM usage specifically for range queries.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

needs release notes Automatically applied to PRs which haven't added release notes

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants