Keep MFD dask output assembly lazy#2886
Merged
Merged
Conversation
The dask assembly step in flow_accumulation_mfd, flow_length_mfd, and watershed_mfd computed each tile and wrapped the numpy result with da.from_array / da.block, materializing the full output raster during the API call. Rebuild the result with da.map_blocks over the lazy fractions array, passing the small converged boundary snapshot into the block function so the per-tile kernel runs at compute time. The boundary-convergence sweep still reads tile data eagerly by design; lazifying that phase is tracked separately.
brendancol
commented
Jun 2, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Keep MFD dask output assembly lazy
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
None blocking. The change is well scoped and the tests cover the laziness contract directly.
Nits (optional improvements)
- The three
_tileclosures in flow_accumulation_mfd.py, flow_length_mfd.py, and watershed_mfd.py are nearly identical, including the searchsorted tile-index lookup. A shared helper could remove the duplication, but the duplication is small and the sibling rockout PRs touch these same files, so factoring it out now would raise the conflict risk. Reasonable to leave as is for this PR.
What looks good
- The map_blocks rework keeps the returned DataArray lazy while leaving the numpy and cupy paths untouched. The boundary-convergence sweep stays eager by necessity, and that limitation is documented in the docstrings and tracked in the follow-up issue #2885.
block_info[0]['array-location']reads from the input fractions coordinates, which is the right reference for the tile index even afterdrop_axis=0collapses the band axis. The watershed path confirms this works with a second 2-D input array.- The rechunk guard for the 8-band axis and the pour-points chunk alignment are both covered by new tests, including a deliberate chunk mismatch.
- Correctness is checked against the numpy path across single-chunk, many-chunk, and band-axis-chunked inputs, with equal_nan comparisons.
Checklist
- Algorithm matches reference: unchanged, kernels reused as is
- Backends consistent: numpy/cupy unchanged; dask+numpy verified against numpy in tests. dask+cupy shares the same iterative path but needs a GPU runner to verify directly.
- NaN handling correct: unchanged from prior tile kernels; equal_nan tests pass
- Edge cases covered: single tile, many tiles, band-axis chunked, pour-points mismatch
- Dask chunk boundaries handled: rechunk guard plus pour-points alignment
- No premature materialization: that is the point of this PR; assembly is now lazy
- Benchmark: not needed, no new public API
- README: not applicable, no new function
- Docstrings present and accurate: assembly helpers document the laziness contract
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2865.
What changed
flow_accumulation_mfd,flow_length_mfd, andwatershed_mfdused to.compute()each tile and wrap the numpy result withda.from_array/da.block, which materialized the whole output raster during the function call.da.map_blocksover the lazy fractions array. The converged boundary snapshot and fraction strips are small, so they ride along in a closure and the per-tile kernel runs at compute time. The returned DataArray is now genuinely lazy.watershed_mfd.The boundary-convergence sweep that runs before assembly still reads tile data eagerly. That is inherent to the algorithm and is tracked separately in #2885, along with the same assembly rework for the stream and hand MFD modules.
Backend coverage
numpy and cupy paths are unchanged. The lazy rework applies to dask+numpy and dask+cupy (the cupy variant converts to numpy, runs the same iterative path, and converts back).
Test plan
flow_accumulation_mfd,flow_length_mfd,watershed_mfddask results match numpy