Skip to content

Conversation

@mkitti
Copy link
Contributor

@mkitti mkitti commented Jan 28, 2026

Purpose.

Support OME-Zarr zip archives (RFC-9). I am an author of OME-Zarr RFC-9. Zarr zip file implementations are wide spread and implemented in a number of Zarr implementations.

Notably Neuroglancer implements Zip file key-value store.

Background

Zip files are documented by a PKWare appnote. A zip file contains local file entries followed by a central directory at the end of the file. While the local file entries at the beginning file could be read in streaming fashion, it is often necessary to read the central directory at the end of the file first. Duplicate local file entries may exist in the beginning of the archive, but only the central directory can correctly indicate the latest version for example.

OME Zarr RFC-9 proposes a standard way to store OME-Zarr datasets in zip files with a OZX extension. A particular challenge for Zarr in zip files in the large numbers of files which may make the central directory more difficult to parse. The RFC recommends a number of recommendations to minimize the number of files such as using sharding.

Another recommendation by the RFC is to list zarr.json metadata first in the central directory. This allows an application like fileglancer which is mainly concerned with parsing metadata to quickly access the metadata without having the parse the entire central directory. It also allows the entire tree of the hieararchy to be elucidated. To facilitate detection of this optimization, JSON is stored in the comment of the zip which occurs at the end of the central directory. This contains a jsonFirst flag to indicate that the writer of the file placed the JSON files first after the central directory. If this flag is true, a reader may assume that no further zarr.json files exist within the archive once the first non-zarr.json file is read.

Design

While Python does provide a zipfile standard library, a key performance optimization here is the abilty to read a partial central directory. While browsing, only the metadata may be needed. As such this pull request contains new code to read the central directory partially, which the standard library does not support.

If there becomes a need stream the contents of the zip file for external applications, then some additional reading and caching of the central directory may be needed.

While I was initially working on this feature with OZX in mind, implementing generic zip file support also seemed useful. Thus the core implementation splits generic zip support and then builds OME-Zarr support on top of that.

mkitti and others added 8 commits January 28, 2026 01:49
Large archives with thousands of entries can be slow to load. This adds
pagination support to load entries incrementally:

- Backend: /api/ozx-list now accepts offset/limit params and returns
  total_count, has_more for pagination
- Frontend: New useOzxFileEntriesInfiniteQuery hook with TanStack Query
- ZipBrowser: Shows "Load more" button and entry count progress

Initial load fetches 100 entries, with more loaded on demand.

Co-Authored-By: Claude Opus 4.5 <[email protected]>
total_count = reader.cd_entries_count

# Parse entries up to offset + limit
reader.parse_central_directory(max_new_entries=offset + limit)
Copy link
Contributor Author

@mkitti mkitti Jan 28, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this should only be limit based on the caching and resumable behavior, but I'm not completely sure about persistence between requests.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant