Skip to content

Validate MFD fraction values in public consumers (#2873)#2883

Open
brendancol wants to merge 2 commits into
mainfrom
issue-2873
Open

Validate MFD fraction values in public consumers (#2873)#2883
brendancol wants to merge 2 commits into
mainfrom
issue-2873

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2873

What

The public MFD functions document a fraction-grid contract: bands in [0, 1], per-cell sum of 1.0 or 0.0, all-NaN bands at edges. They only enforced the (8, H, W) shape, though. Feed them bad fraction values and you got wrong-but-plausible hydrology back instead of an error.

This adds _validate_mfd_fractions to xrspatial/utils.py and calls it right after the shape check in all seven public consumers:

  • flow_accumulation_mfd
  • flow_length_mfd
  • stream_order_mfd
  • stream_link_mfd
  • flow_path_mfd
  • hand_mfd
  • watershed_mfd

The check raises a clear ValueError for negative fractions, per-cell band sums that are neither ~1.0 nor ~0.0 (tolerance 1e-6), and partial-NaN cells (some of the 8 bands NaN, the rest finite).

Backends

numpy and cupy inputs are validated eagerly. Dask inputs (numpy- or cupy-backed) skip the value check so laziness is preserved; lazy validation is a separate concern tracked elsewhere.

Scope

Fraction value and topology only. Secondary-raster shape validation, cyclic-graph detection, and dask-laziness are sibling PRs and aren't touched here.

Test plan

  • New tests in test_validate_mfd_fractions.py cover negative, bad-sum, and partial-NaN rejection across all seven functions
  • Valid grids from flow_direction_mfd still pass, including sink cells that sum to 0.0 and all-NaN edge cells
  • Dask input with invalid values does not raise eagerly
  • Existing MFD suite (144 tests) still green

Add _validate_mfd_fractions to utils and call it after the shape check
in the seven public MFD functions: flow_accumulation_mfd,
flow_length_mfd, stream_order_mfd, stream_link_mfd, flow_path_mfd,
hand_mfd, watershed_mfd.

The check rejects negative fractions, per-cell band sums that are
neither ~1.0 nor ~0.0, and partial-NaN band patterns, raising a clear
ValueError before any hydrology math runs. numpy and cupy inputs are
validated eagerly; dask inputs are skipped so laziness is preserved.
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: Validate MFD fraction values in public consumers (#2873)

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • utils.py:244 builds finite = data[~nan_mask] just to test for negatives, which copies the whole grid when nothing is NaN. You can skip the copy: bool((data < 0).any()) does the same thing, because NaN < 0 is False so NaN cells never trip it. Only matters for big grids, but the copy is avoidable.

Nits (optional improvements)

  • The three error messages each rebuild the func_name(): name ... prefix. With only three it's fine, but a local prefix variable would trim the repetition.
  • The helper docstring notes that only numpy and cupy inputs get value-checked. Worth saying the same in the public function docstrings so nobody assumes a dask grid gets the same guard. Optional.

What looks good

  • The check sits after the shape check and before any hydrology runs in all seven consumers, so bad input fails early like it should.
  • Sinks (sum 0.0) and all-NaN edge cells both still pass, and the tests assert that.
  • Partial-NaN detection via the per-cell band count (0 < count < 8) is the right topology test.
  • Dask is skipped on purpose to keep laziness intact, which keeps this PR in scope and out of the dask-laziness sibling PR's way.
  • Tests cover negative, bad-sum, and partial-NaN across all seven functions, plus the valid-input and dask-skip paths.

Checklist

  • Matches the documented fraction contract
  • NaN handling correct (partial vs all-NaN distinguished)
  • Edge cases covered (sink, all-NaN edge, valid grid)
  • Dask path intentionally skipped and documented
  • No new public function, so no benchmark/README needed
  • Helper has a full docstring

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after 2b030e3)

Re-checked the helper in xrspatial/utils.py after the review fixes.

Resolved

  • Negative check no longer copies the grid. It now uses bool((data < 0).any()), and NaN cells don't trip it since NaN < 0 is False. Confirmed no RuntimeWarning under -W error::RuntimeWarning.
  • The repeated func_name(): name prefix is now a single local prefix variable shared by all three error messages.

Still open (deferred, not blocking)

  • Adding a "dask inputs aren't value-checked" note to each of the seven public docstrings. Deferred: it would edit docstrings in all seven MFD files, which are being touched by sibling PRs right now, so the extra conflict surface isn't worth it. The helper's own docstring already documents the dask-skip behavior. Can fold into a later docs pass.

Tests: full MFD suite plus the new validation tests pass (151 passed).

@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 2, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

MFD fraction-grid functions accept invalid fractions without validation

1 participant