Skip to content

Fix hotspots() silent all-zeros on degenerate dask inputs#2860

Merged
brendancol merged 2 commits into
mainfrom
issue-2843
Jun 3, 2026
Merged

Fix hotspots() silent all-zeros on degenerate dask inputs#2860
brendancol merged 2 commits into
mainfrom
issue-2843

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2843

What

  • The numpy and cupy hotspots() paths validate the global Gi* terms via _gistar_global_stats, raising for fewer than 2 valid cells (ValueError) or a zero global std (ZeroDivisionError). The dask paths kept those terms as lazy 0-d arrays and never ran the check, so constant, all-NaN, or single-valid-cell rasters classified to a silent all-zeros raster instead of raising.
  • Added _gistar_validate_lazy, a map_blocks step that re-applies the numpy-path check at compute time and returns the z-score unchanged. The graph stays lazy on call (issue hotspots() computes global stats eagerly for Dask input instead of staying lazy #2772); the error now fires at compute() to match numpy and cupy.

Backends

numpy and cupy unchanged. dask+numpy and dask+cupy now raise the same errors as numpy on degenerate input.

Test plan

  • test_hotspots_degenerate_*_2843 across numpy / cupy / dask+numpy / dask+cupy for constant, all-NaN, and single-valid-cell rasters
  • Existing hotspots tests still pass, including the dask laziness checks (test_hotspots_no_eager_compute)

The numpy and cupy hotspots paths validate the global Gi* terms through
_gistar_global_stats, which raises for fewer than 2 valid cells or a zero
global std. The dask paths kept those terms as lazy 0-d arrays and skipped
the check, so a constant, all-NaN, or single-valid-cell raster classified
to a silent all-zeros result instead of raising.

Add a lazy validation block (_gistar_validate_lazy) that re-applies the
numpy-path check at compute time and returns the z-score unchanged. The
dask graph stays lazy on call (issue #2772), and the error now fires at
compute() like the numpy and cupy backends.

Cover the three degenerate cases across numpy, cupy, dask+numpy, and
dask+cupy.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 2, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Fix hotspots() silent all-zeros on degenerate dask inputs

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None.

Nits (optional improvements)

  • _gistar_validate_lazy runs once per chunk via map_blocks, so the check repeats for every block of a multi-chunk raster. It is two scalar comparisons and only one block needs to raise, so this is harmless. Putting the guard in the block function (rather than a single up-front check) is what keeps the graph lazy, which is the point of #2772. (focal.py:1374)

What looks good

  • The fix keeps the laziness contract from #2772: validation is folded into the dask graph via map_blocks and fires at compute(), not at graph-build time. test_hotspots_no_eager_compute still passes.
  • _gistar_validate_lazy reuses _gistar_global_stats, so the dask error messages and exception types match numpy/cupy exactly (ValueError for n<2, ZeroDivisionError for std==0).
  • meta=z_array._meta keeps the chunk element type, so the dask+cupy path stays on-device. ._meta is already used this way in kde.py and utils.py.
  • Tests cover all three degenerate cases (constant, all-NaN, single valid cell) across numpy, cupy, dask+numpy, and dask+cupy, with the dask tests asserting the error fires at compute().

Checklist

  • Algorithm matches reference (no algorithm change; validation parity only)
  • All implemented backends produce consistent results
  • NaN handling is correct (all-NaN -> n=0 -> ValueError)
  • Edge cases are covered by tests
  • Dask chunk boundaries handled correctly (0-d scalars broadcast to every block)
  • No premature materialization (global terms stay lazy)
  • Benchmark not needed (bugfix, no perf change)
  • README feature matrix not applicable (no new function, no backend change)
  • Docstrings present and accurate

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after docs commit)

Re-reviewed after the second commit. The only change since the first review is a docs fix.

Changes since last review

  • docs/source/reference/dask_laziness.rst: the hotspots row still said "Partially lazy / Computes global mean and std," which went stale when #2772 made the dask path lazy. Updated it to "Fully lazy" with a note that the global terms stay lazy and the degenerate-input check fires at compute. This matches the code and test_hotspots_no_eager_compute.

Disposition of earlier findings

  • No Blockers or Suggestions were raised.
  • The single Nit (validation runs per block) is intentional: putting the guard inside the block function is what keeps the graph lazy per #2772. Dismissed, not a defect.

No new issues. Code is unchanged from the first review.

@brendancol brendancol merged commit 417c5b0 into main Jun 3, 2026
6 of 7 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

hotspots() silently returns all-zeros for degenerate dask inputs where NumPy/CuPy raise

1 participant