Validate MFD fraction values in public consumers (#2873)#2883
Open
brendancol wants to merge 2 commits into
Open
Validate MFD fraction values in public consumers (#2873)#2883brendancol wants to merge 2 commits into
brendancol wants to merge 2 commits into
Conversation
Add _validate_mfd_fractions to utils and call it after the shape check in the seven public MFD functions: flow_accumulation_mfd, flow_length_mfd, stream_order_mfd, stream_link_mfd, flow_path_mfd, hand_mfd, watershed_mfd. The check rejects negative fractions, per-cell band sums that are neither ~1.0 nor ~0.0, and partial-NaN band patterns, raising a clear ValueError before any hydrology math runs. numpy and cupy inputs are validated eagerly; dask inputs are skipped so laziness is preserved.
brendancol
commented
Jun 2, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
PR Review: Validate MFD fraction values in public consumers (#2873)
Blockers (must fix before merge)
None.
Suggestions (should fix, not blocking)
utils.py:244buildsfinite = data[~nan_mask]just to test for negatives, which copies the whole grid when nothing is NaN. You can skip the copy:bool((data < 0).any())does the same thing, becauseNaN < 0is False so NaN cells never trip it. Only matters for big grids, but the copy is avoidable.
Nits (optional improvements)
- The three error messages each rebuild the
func_name(): name ...prefix. With only three it's fine, but a local prefix variable would trim the repetition. - The helper docstring notes that only numpy and cupy inputs get value-checked. Worth saying the same in the public function docstrings so nobody assumes a dask grid gets the same guard. Optional.
What looks good
- The check sits after the shape check and before any hydrology runs in all seven consumers, so bad input fails early like it should.
- Sinks (sum 0.0) and all-NaN edge cells both still pass, and the tests assert that.
- Partial-NaN detection via the per-cell band count (0 < count < 8) is the right topology test.
- Dask is skipped on purpose to keep laziness intact, which keeps this PR in scope and out of the dask-laziness sibling PR's way.
- Tests cover negative, bad-sum, and partial-NaN across all seven functions, plus the valid-input and dask-skip paths.
Checklist
- Matches the documented fraction contract
- NaN handling correct (partial vs all-NaN distinguished)
- Edge cases covered (sink, all-NaN edge, valid grid)
- Dask path intentionally skipped and documented
- No new public function, so no benchmark/README needed
- Helper has a full docstring
brendancol
commented
Jun 2, 2026
Contributor
Author
brendancol
left a comment
There was a problem hiding this comment.
Follow-up review (after 2b030e3)
Re-checked the helper in xrspatial/utils.py after the review fixes.
Resolved
- Negative check no longer copies the grid. It now uses
bool((data < 0).any()), and NaN cells don't trip it sinceNaN < 0is False. Confirmed no RuntimeWarning under-W error::RuntimeWarning. - The repeated
func_name(): nameprefix is now a single localprefixvariable shared by all three error messages.
Still open (deferred, not blocking)
- Adding a "dask inputs aren't value-checked" note to each of the seven public docstrings. Deferred: it would edit docstrings in all seven MFD files, which are being touched by sibling PRs right now, so the extra conflict surface isn't worth it. The helper's own docstring already documents the dask-skip behavior. Can fold into a later docs pass.
Tests: full MFD suite plus the new validation tests pass (151 passed).
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes #2873
What
The public MFD functions document a fraction-grid contract: bands in [0, 1], per-cell sum of 1.0 or 0.0, all-NaN bands at edges. They only enforced the (8, H, W) shape, though. Feed them bad fraction values and you got wrong-but-plausible hydrology back instead of an error.
This adds
_validate_mfd_fractionstoxrspatial/utils.pyand calls it right after the shape check in all seven public consumers:The check raises a clear ValueError for negative fractions, per-cell band sums that are neither ~1.0 nor ~0.0 (tolerance 1e-6), and partial-NaN cells (some of the 8 bands NaN, the rest finite).
Backends
numpy and cupy inputs are validated eagerly. Dask inputs (numpy- or cupy-backed) skip the value check so laziness is preserved; lazy validation is a separate concern tracked elsewhere.
Scope
Fraction value and topology only. Secondary-raster shape validation, cyclic-graph detection, and dask-laziness are sibling PRs and aren't touched here.
Test plan
test_validate_mfd_fractions.pycover negative, bad-sum, and partial-NaN rejection across all seven functionsflow_direction_mfdstill pass, including sink cells that sum to 0.0 and all-NaN edge cells