Skip to content

Fix bounded-dask proximity crash on skinny rasters#2877

Merged
brendancol merged 2 commits into
mainfrom
issue-2854
Jun 3, 2026
Merged

Fix bounded-dask proximity crash on skinny rasters#2877
brendancol merged 2 commits into
mainfrom
issue-2854

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2854

What

  • _halo_depth() can return a pixel search radius larger than the raster height or width. That depth went straight into da.map_overlap, which rejects a depth larger than the array along an axis and raised ValueError: The overlapping depth ... is larger than your array .... Reproduced on a 3x100 raster with one target and max_distance=10.
  • Add _fit_halo_to_chunks(): when the halo is deeper than the smallest chunk on an axis, fold that axis into a single chunk and set its depth to 0. Every chunk then sees the full axis, so no target within max_distance is dropped and the result still matches the numpy backend. Applied to both the dask+numpy and dask+cupy map_overlap call sites.

Backend coverage

Fix touches the dask+numpy and dask+cupy paths. numpy and cupy (non-dask) paths are unaffected. The normal (non-skinny) dask path keeps its existing chunking, so there is no extra materialisation in the common case.

Test plan

  • New regression test test_bounded_dask_skinny_raster_matches_numpy covers the 3x100 skinny raster across proximity, allocation, and direction, pinned against the numpy backend.
  • Full test_proximity.py passes (282 tests).
  • Verified manually: multi-chunk skinny raster and a great-circle skinny raster near the pole also match numpy.

_halo_depth() can return a pixel search radius larger than the raster
height or width. That depth was passed straight to da.map_overlap, which
rejects a depth larger than the array along an axis and raised
"ValueError: The overlapping depth ... is larger than your array ...".

Add _fit_halo_to_chunks(): when the halo is deeper than the smallest
chunk on an axis, fold that axis into a single chunk and drop its depth
to zero. Every chunk then sees the full axis, so no target within
max_distance is missed and the result still matches the numpy backend.
Applied to both the dask+numpy and dask+cupy map_overlap call sites.

Add a regression test covering the 3x100 skinny raster across proximity,
allocation, and direction, pinned against the numpy backend.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 2, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix bounded-dask proximity crash on skinny rasters

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • xrspatial/proximity.py:307 _fit_halo_to_chunks rechunks an over-deep axis to a single chunk instead of just clamping the depth. That is the correct call: clamping the depth while keeping multiple chunks would silently drop targets that live in non-adjacent chunks. The cost is that the folded axis is fully materialised per task, which is cheap for the skinny axis that triggers it. Worth a one-line docstring note that the fold trades chunking on that axis for correctness, so a later reader does not "optimise" it back to a bare depth clamp.

What looks good

  • The fix lives at the two map_overlap call sites (dask+numpy at proximity.py:1431, dask+cupy at proximity.py:614) and leaves _halo_depth alone. The depth math is unchanged; only the chunking moves, and only when it has to.
  • Folding the over-deep axis to a single chunk and zeroing its depth is correct. Every chunk then sees the whole axis, so no target within max_distance is lost. Checked against numpy for the skinny case, a multi-chunk skinny case, and a great-circle skinny case near the pole.
  • The common non-skinny path is untouched. A halo smaller than the chunk leaves the chunking alone, so there is no extra materialisation in the normal case.
  • The numpy path reassigns raster.data, which matches the unbounded branch right above it. The cupy path uses a local raster_data to avoid mutating the cupy-backed input. The asymmetry is deliberate.
  • Regression test covers proximity, allocation, and direction on the 3x100 raster, pinned against numpy with equal_nan.

Checklist

  • Algorithm correctness: depth math unchanged; chunking fold is correct.
  • Backend parity: dask+numpy and dask+cupy both fixed; numpy/cupy paths untouched. The dask+cupy fold is not exercised in CI (no GPU) but mirrors the numpy path.
  • NaN handling: unchanged; boundary=np.nan still used.
  • Edge cases: skinny, multi-chunk skinny, and great-circle skinny near the pole all checked.
  • Dask chunk boundaries: handled by the rechunk-to-single-chunk fold.
  • No premature materialisation in the common path.
  • Docstring present on the new helper.
  • README/benchmark: not applicable (pure bug fix, no new public API).

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 9f2c39e)

The single nit from the first pass is addressed: _fit_halo_to_chunks now carries a docstring note that the single-chunk fold is deliberate and must not be swapped for a bare depth clamp (which would drop targets in non-adjacent chunks).

No new findings. No blockers or suggestions outstanding. The skinny-raster regression test still passes and flake8 is clean on the changed file.

@brendancol brendancol merged commit 4c56bfc into main Jun 3, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Bounded Dask proximity crashes when halo depth exceeds an axis length

1 participant