perf: Vectorize get_chunk_slice for faster sharded writes by mkitti · Pull Request #3713 · zarr-developers/zarr-python

mkitti · 2026-02-17T06:10:03Z

Summary

This PR adds vectorized methods to _ShardIndex and _ShardReader for batch chunk slice lookups, significantly reducing per-chunk function call overhead when writing to shards.

Changes

New Methods

_ShardIndex.get_chunk_slices_vectorized: Batch lookup of chunk slices using NumPy vectorized operations instead of per-chunk Python calls.

_ShardReader.to_dict_vectorized: Build a chunk dictionary using vectorized lookup instead of iterating with individual get() calls.

Modified Code Path

In _encode_partial_single, replaced:

shard_dict = {k: shard_reader.get(k) for k in morton_order_iter(chunks_per_shard)}

With vectorized approach:

morton_coords = _morton_order(chunks_per_shard)
chunk_coords_array = np.array(morton_coords, dtype=np.uint64)
shard_dict = shard_reader.to_dict_vectorized(chunk_coords_array, morton_coords)

Benchmark Results

Single Chunk Write to Large Shard

Writing a single 1x1x1 chunk to a shard with 32³ chunks (using test_sharded_morton_write_single_chunk from PR #3712):

Optimization	Time	Speedup vs Main
Main branch (original)	422ms	-
+ Morton optimization (PR #3708)	261ms	1.6x
+ Vectorized get_chunk_slice	95ms	4.4x

Profile Breakdown

Function	Before	After
`get_chunk_slice` + `_localize_chunk`	215ms	3ms
`to_dict_vectorized` loop	81ms	9ms
Total function calls	299k	37k

Checklist

Add unit tests and/or doctests in docstrings
Add docstrings and API docs for any new/modified user-facing classes and functions
New/modified features documented in docs/user-guide/*.md
Changes documented as a new file in changes/
GitHub Actions have all passed
Test coverage is 100% (Codecov passes)

Add benchmarks that clear the _morton_order LRU cache before each iteration to measure the full Morton computation cost: - test_sharded_morton_indexing: 512-4096 chunks per shard - test_sharded_morton_indexing_large: 32768 chunks per shard Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Add vectorized methods to _ShardIndex and _ShardReader for batch chunk slice lookups, reducing per-chunk function call overhead when writing to shards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

mkitti · 2026-02-18T16:20:55Z

@d-v-b add the benchmark tag here as well please.

codspeed-hq · 2026-02-18T16:43:48Z

Merging this PR will improve performance by ×7.2

⚡ 4 improved benchmarks
✅ 52 untouched benchmarks
⏩ 6 skipped benchmarks¹

Performance Changes

	Mode	Benchmark	`BASE`	`HEAD`	Efficiency
⚡	WallTime	`test_morton_order_iter[(8, 8, 8)]`	6.2 ms	1.1 ms	×5.5
⚡	WallTime	`test_morton_order_iter[(16, 16, 16)]`	56.2 ms	8.1 ms	×6.9
⚡	WallTime	`test_morton_order_iter[(32, 32, 32)]`	502.6 ms	70.2 ms	×7.2
⚡	WallTime	`test_sharded_morton_write_single_chunk[(32, 32, 32)-memory]`	953.3 ms	164.4 ms	×5.8

_{Comparing mkitti:mkitti-get-chunk-slice-vectorization (157d283) with main (36caf1f)}

6 benchmarks were skipped, so the baseline results were used instead. If they were deleted from the codebase, click here and archive them to remove them from the performance reports. ↩

d-v-b · 2026-02-18T20:03:28Z

src/zarr/codecs/sharding.py

+        # Use vectorized lookup for better performance
+        morton_coords = _morton_order(chunks_per_shard)
+        chunk_coords_array = np.array(morton_coords, dtype=np.uint64)
+        shard_dict = shard_reader.to_dict_vectorized(chunk_coords_array, morton_coords)


why does to_dict_vectorized take the same values as 2 different types? Can't we pass in morton_coords and have the array version created internally as an implementation detail?

mkitti and others added 21 commits February 13, 2026 00:15

perf: Skip bounds check for initial elements in 2^n hypercube

47730f3

lint:Use a list comprehension rather than a for loop

865df2a

pref:Add decode_morton_vectorized

ce0099c

perf:Replace math.log2() with bit_length()

1b1076f

perf:Use magic numbers for 2D and 3D

47a68eb

perf:Add 4D Morton magic numbers

6fb6d00

perf:Add Morton magic numbers for 5D

db31842

perf:Remove singleton dimensions to reduce ndims

f9952f1

Add changes

aedce5a

fix:Address type annotation and linting issues

ef18210

perf:Remove magic number functions

24dcbd5

test:Add power of 2 sharding indexing tests

7b3db07

fix:Bound LRU cache of _morton_order to 16

1cdcbdf

Merge branch 'main' into mkitti-morton-decode-optimization

536f520

Merge branch 'main' into mkitti-morton-decode-optimization

65205b3

test:Add a single chunk test for a large shard

c872e2b

test:Add indexing benchmarks for writing

1cad983

tests:Add single chunk write test for sharding

a666211

perf: Vectorize get_chunk_slice for faster sharded writes

6129cd3

Add vectorized methods to _ShardIndex and _ShardReader for batch chunk slice lookups, reducing per-chunk function call overhead when writing to shards. Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>

Merge branch 'main' into mkitti-get-chunk-slice-vectorization

157d283

d-v-b added the benchmark Code will be benchmarked in a CI job. label Feb 18, 2026

d-v-b reviewed Feb 18, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

perf: Vectorize get_chunk_slice for faster sharded writes#3713

perf: Vectorize get_chunk_slice for faster sharded writes#3713
mkitti wants to merge 21 commits intozarr-developers:mainfrom
mkitti:mkitti-get-chunk-slice-vectorization

mkitti commented Feb 17, 2026 •

edited

Loading

Uh oh!

mkitti commented Feb 18, 2026

Uh oh!

codspeed-hq bot commented Feb 18, 2026

Uh oh!

d-v-b Feb 18, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

Uh oh!

Conversation

mkitti commented Feb 17, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Summary

Changes

New Methods

Modified Code Path

Benchmark Results

Single Chunk Write to Large Shard

Profile Breakdown

Checklist

Uh oh!

mkitti commented Feb 18, 2026

Uh oh!

codspeed-hq bot commented Feb 18, 2026

Merging this PR will improve performance by ×7.2

Performance Changes

Footnotes

Uh oh!

d-v-b Feb 18, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Comments

mkitti commented Feb 17, 2026 •

edited

Loading