Skip to content

Improve Xenium performance, fix multinucleate cells bug#376

Merged
LucaMarconato merged 3 commits intomainfrom
more-xenium-performance
Feb 25, 2026
Merged

Improve Xenium performance, fix multinucleate cells bug#376
LucaMarconato merged 3 commits intomainfrom
more-xenium-performance

Conversation

@LucaMarconato
Copy link
Member

@LucaMarconato LucaMarconato commented Feb 25, 2026

The last commit message is indicative of this PR. Copy-pasting it below.

Refactor xenium polygon reader to use zarr-based indices mapping
Replace slow parquet string-based cell_id grouping with zarr polygon_sets
for both nucleus and cell boundaries. This fixes a multinucleate cell bug
where multiple nuclei sharing the same cell_id were incorrectly merged
into a single polygon, and improves performance by avoiding expensive
string operations on large parquet columns.

Key changes:

  • Split _get_labels_and_indices_mapping into focused functions:
    _get_labels, _get_indices_mapping_from_zarr, _get_indices_mapping_legacy
  • Nucleus boundaries now use integer label_index as GeoDataFrame index
    (with cell_id as a column), correctly handling multinucleate cells
  • Cell boundaries keep string cell_id as GeoDataFrame index (legacy)
  • Read only needed parquet columns for faster I/O
  • Use integer label_id for fast change detection when available

LucaMarconato and others added 2 commits February 25, 2026 00:10
Replace slow parquet string-based cell_id grouping with zarr polygon_sets
for both nucleus and cell boundaries. This fixes a multinucleate cell bug
where multiple nuclei sharing the same cell_id were incorrectly merged
into a single polygon, and improves performance by avoiding expensive
string operations on large parquet columns.

Key changes:
- Split _get_labels_and_indices_mapping into focused functions:
  _get_labels, _get_indices_mapping_from_zarr, _get_indices_mapping_legacy
- Nucleus boundaries now use integer label_index as GeoDataFrame index
  (with cell_id as a column), correctly handling multinucleate cells
- Cell boundaries keep string cell_id as GeoDataFrame index (legacy)
- Read only needed parquet columns for faster I/O
- Use integer label_id for fast change detection when available

Co-Authored-By: Claude Opus 4.6 <noreply@anthropic.com>
@codecov-commenter
Copy link

codecov-commenter commented Feb 25, 2026

Codecov Report

❌ Patch coverage is 77.55102% with 11 lines in your changes missing coverage. Please review.
✅ Project coverage is 61.65%. Comparing base (9011cfd) to head (c144815).
⚠️ Report is 4 commits behind head on main.

Files with missing lines Patch % Lines
src/spatialdata_io/readers/xenium.py 77.55% 11 Missing ⚠️
Additional details and impacted files
@@            Coverage Diff             @@
##             main     #376      +/-   ##
==========================================
- Coverage   61.80%   61.65%   -0.15%     
==========================================
  Files          27       27              
  Lines        3155     3161       +6     
==========================================
- Hits         1950     1949       -1     
- Misses       1205     1212       +7     
Files with missing lines Coverage Δ
src/spatialdata_io/readers/xenium.py 71.04% <77.55%> (-1.44%) ⬇️
🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@LucaMarconato LucaMarconato merged commit a20bbaf into main Feb 25, 2026
5 checks passed
@LucaMarconato LucaMarconato deleted the more-xenium-performance branch February 25, 2026 19:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants