Skip to content

Keep MFD dask output assembly lazy#2886

Merged
brendancol merged 1 commit into
mainfrom
issue-2865
Jun 3, 2026
Merged

Keep MFD dask output assembly lazy#2886
brendancol merged 1 commit into
mainfrom
issue-2865

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2865.

What changed

  • The dask assembly step in flow_accumulation_mfd, flow_length_mfd, and watershed_mfd used to .compute() each tile and wrap the numpy result with da.from_array / da.block, which materialized the whole output raster during the function call.
  • Rebuilt the assembly with da.map_blocks over the lazy fractions array. The converged boundary snapshot and fraction strips are small, so they ride along in a closure and the per-tile kernel runs at compute time. The returned DataArray is now genuinely lazy.
  • Added a rechunk guard so the 8 direction bands stay in one chunk (every tile kernel needs all 8), and aligned the pour-points chunks to the fractions tile grid in watershed_mfd.

The boundary-convergence sweep that runs before assembly still reads tile data eagerly. That is inherent to the algorithm and is tracked separately in #2885, along with the same assembly rework for the stream and hand MFD modules.

Backend coverage

numpy and cupy paths are unchanged. The lazy rework applies to dask+numpy and dask+cupy (the cupy variant converts to numpy, runs the same iterative path, and converts back).

Test plan

  • flow_accumulation_mfd, flow_length_mfd, watershed_mfd dask results match numpy
  • New test asserts the assembly kernel runs at compute time, not during the call
  • New test covers an input chunked along the 8-band axis (rechunk guard)
  • New test covers pour points chunked differently from fractions
  • Full hydro suite passes (860 tests)

The dask assembly step in flow_accumulation_mfd, flow_length_mfd, and
watershed_mfd computed each tile and wrapped the numpy result with
da.from_array / da.block, materializing the full output raster during
the API call. Rebuild the result with da.map_blocks over the lazy
fractions array, passing the small converged boundary snapshot into the
block function so the per-tile kernel runs at compute time.

The boundary-convergence sweep still reads tile data eagerly by design;
lazifying that phase is tracked separately.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Keep MFD dask output assembly lazy

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

None blocking. The change is well scoped and the tests cover the laziness contract directly.

Nits (optional improvements)

  • The three _tile closures in flow_accumulation_mfd.py, flow_length_mfd.py, and watershed_mfd.py are nearly identical, including the searchsorted tile-index lookup. A shared helper could remove the duplication, but the duplication is small and the sibling rockout PRs touch these same files, so factoring it out now would raise the conflict risk. Reasonable to leave as is for this PR.

What looks good

  • The map_blocks rework keeps the returned DataArray lazy while leaving the numpy and cupy paths untouched. The boundary-convergence sweep stays eager by necessity, and that limitation is documented in the docstrings and tracked in the follow-up issue #2885.
  • block_info[0]['array-location'] reads from the input fractions coordinates, which is the right reference for the tile index even after drop_axis=0 collapses the band axis. The watershed path confirms this works with a second 2-D input array.
  • The rechunk guard for the 8-band axis and the pour-points chunk alignment are both covered by new tests, including a deliberate chunk mismatch.
  • Correctness is checked against the numpy path across single-chunk, many-chunk, and band-axis-chunked inputs, with equal_nan comparisons.

Checklist

  • Algorithm matches reference: unchanged, kernels reused as is
  • Backends consistent: numpy/cupy unchanged; dask+numpy verified against numpy in tests. dask+cupy shares the same iterative path but needs a GPU runner to verify directly.
  • NaN handling correct: unchanged from prior tile kernels; equal_nan tests pass
  • Edge cases covered: single tile, many tiles, band-axis chunked, pour-points mismatch
  • Dask chunk boundaries handled: rechunk guard plus pour-points alignment
  • No premature materialization: that is the point of this PR; assembly is now lazy
  • Benchmark: not needed, no new public API
  • README: not applicable, no new function
  • Docstrings present and accurate: assembly helpers document the laziness contract

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 2, 2026
@brendancol brendancol merged commit 8b89e6e into main Jun 3, 2026
10 of 12 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MFD dask paths materialize the output raster eagerly during the API call

1 participant