Skip to content

geotiff: align open_geotiff parameters with rioxarray's open_rasterio (#2961)#2963

Merged
brendancol merged 7 commits into
mainfrom
issue-2961
Jun 5, 2026
Merged

geotiff: align open_geotiff parameters with rioxarray's open_rasterio (#2961)#2963
brendancol merged 7 commits into
mainfrom
issue-2961

Conversation

@brendancol
Copy link
Copy Markdown
Contributor

Closes #2961

What changed

Aligns open_geotiff's read parameters with rioxarray's open_rasterio so callers can move between the two with minimal edits.

  • Renamed mask_nodata -> masked and flipped the default from True to False to match open_rasterio. A bare open_geotiff(path) no longer promotes the nodata sentinel to NaN; pass masked=True for the old behavior. mask_nodata stays as a deprecated alias (DeprecationWarning; passing both raises TypeError).
  • Renamed name -> default_name, with name kept as a deprecated alias on the same terms.
  • Added mask_and_scale: reads GDAL SCALE/OFFSET from attrs['gdal_metadata'], returns data * scale + offset, masks the sentinel, and records attrs['scale_factor']/attrs['add_offset']. No-op when the source carries no scale/offset.
  • Added parse_coordinates: False skips the x/y coordinate arrays while keeping attrs['transform']/attrs['crs'].
  • Added lock/cache for signature compatibility. xrspatial's reader re-opens the source per window, so these have no analog; passing a non-default value emits a GeoTIFFFallbackWarning rather than silently ignoring the kwarg.
  • chunks already matched rioxarray and is unchanged, as are the xrspatial-only parameters (gpu, window, bbox, max_pixels, allow_*, tier flags).

The deprecation aliases reuse the existing sentinel pattern from read_geotiff_gpu's gpu -> on_gpu_failure rename.

Scope note

masked and default_name thread through all four backends (eager, dask, GPU, VRT). mask_and_scale and parse_coordinates are implemented on the CPU eager and dask paths; combining either with gpu=True or a .vrt source raises ValueError with a message pointing at the supported path, matching how the dispatcher already gates on_gpu_failure/missing_sources/max_cloud_bytes per backend. This keeps the change inside the tiers where behavior is parity-gated rather than threading untested code through the GPU and VRT-mosaic readers.

Backend coverage

  • numpy: full (rename, default flip, mask_and_scale, parse_coordinates)
  • dask+numpy: full
  • cupy / dask+cupy: rename + default flip; mask_and_scale/parse_coordinates=False rejected with a clear error

Test plan

  • New xrspatial/geotiff/tests/read/test_rioxarray_compat_2961.py: default flip, deprecation warnings, both-passed TypeError, alias equivalence, mask_and_scale (eager + dask, no-metadata no-op, dtype=int raises), parse_coordinates, lock/cache warnings, GPU/VRT gating.
  • Migrated the existing suite for the default flip (masking-behavior tests now pass masked=True; canonical names replace deprecated-alias usage).
  • Full xrspatial/geotiff/tests suite green (GPU available in the dev env, so cupy/dask+cupy/gpu params ran).
  • xrspatial/tests/test_reproject.py, test_rasterize.py, test_accessor.py green (the reproject datum-grid fallback now matches its rasterio primary path, which also does not mask).

The user-guide notebook and README feature-matrix steps were skipped: this is parameter alignment on an existing function, not a new spatial operation or backend-support change.

Rename mask_nodata->masked (default flips True->False) and name->default_name
with deprecation aliases; add mask_and_scale, parse_coordinates, lock, cache.
mask_and_scale/parse_coordinates implemented on eager+dask CPU paths; gated
for gpu/vrt. lock/cache accept+warn.
@github-actions github-actions Bot added the performance PR touches performance-sensitive code label Jun 5, 2026
Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR Review: align open_geotiff parameters with rioxarray's open_rasterio

Reviewed the four source files in full (__init__.py, _attrs.py, _backends/dask.py, _runtime.py) plus the test and doc changes. Full geotiff suite is green locally (6092 passed) with GPU present, and the new compat tests cover the renames, the default flip, both-passed errors, mask_and_scale eager/dask, parse_coordinates, the lock/cache warnings, and the GPU/VRT gates.

Blockers (must fix before merge)

None.

Suggestions (should fix, not blocking)

  • Round-trip double-apply with mask_and_scale. In _attrs.py (_finalize_eager_read) and _backends/dask.py, after applying data * scale + offset the source's attrs['gdal_metadata'] still carries the raw SCALE/OFFSET items. If a caller reads with mask_and_scale=True and then writes the result with to_geotiff, the writer re-embeds those items, so a later mask_and_scale=True read applies the scale a second time. rioxarray sidesteps this by moving the values to encoding. Either drop SCALE/OFFSET from the propagated gdal_metadata once they have been applied, or document the round-trip caveat on the mask_and_scale parameter.
  • Resolution edge case in open_geotiff (__init__.py). The "both passed" guard is if mask_nodata is not <sentinel>: if masked is not False: raise. Because masked has a real default of False, an explicit masked=False cannot be told apart from the default, so open_geotiff(path, masked=False, mask_nodata=True) does not raise and silently resolves to masking on. This matches the documented stance in read_geotiff_gpu, but a one-line comment in the resolution block noting the case would save the next reader the trace.

Nits (optional improvements)

  • _extract_scale_offset (_attrs.py) applies one scale/offset pair to all bands and falls back to band 0 for per-band metadata. The docstring already says this; consider a warning when per-band values actually disagree so a multi-band source with mismatched scaling does not silently use band 0.
  • __init__.py computes _is_vrt_source_early for the new gate and then _is_vrt_source again a few lines later. Harmless, but the second could reuse the first.

What looks good

  • The deprecation aliases reuse the existing sentinel pattern from the gpu -> on_gpu_failure rename rather than inventing a new mechanism, including the both-passed TypeError.
  • mask_and_scale is verified to match between the eager and dask paths, and the dtype=<int> + scale combination raises through the existing float-to-int guard.
  • The default flip is the risk here, and the test migration is thorough: masking-behavior tests now opt in with masked=True, and the two tests that pinned the old default were rewritten to pin the new one.
  • Scoping mask_and_scale / parse_coordinates=False to the CPU eager and dask paths with an explicit ValueError for GPU and VRT keeps the change inside the parity-gated tiers instead of threading untested code through the GPU and VRT readers.

Checklist

  • Behavior matches rioxarray's open_rasterio for the aligned parameters
  • Renames thread through all four backends; new behavioral options gated where unsupported
  • NaN / sentinel masking correct under the new default
  • Edge cases covered (no-metadata no-op, dtype clash, both-passed, deprecation warnings)
  • Dask path matches eager for mask_and_scale
  • No premature materialization (dask scale/offset stays lazy)
  • README feature matrix not applicable (no new function)
  • Docstrings present and updated

Copy link
Copy Markdown
Contributor Author

@brendancol brendancol left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Follow-up review (after commit a59264c)

Disposition of the findings from the first pass:

  • Fixed: round-trip double-apply with mask_and_scale. The reader sets both attrs['gdal_metadata'] and attrs['gdal_metadata_xml'], and to_geotiff prefers the XML, so stripping only the dict would not have prevented re-embedding and stripping the serialized XML pulls the writer path into this read-focused change. Documented the caveat on the mask_and_scale parameter instead (drop the scale/offset tags and scale_factor/add_offset before writing if a clean round-trip matters). Filed as the proportionate fix; a writer-side strip can be a follow-up if it comes up in practice.
  • Fixed: added a comment in the open_geotiff resolution block spelling out the masked=False + mask_nodata=True edge case.
  • Fixed (nit): unified the duplicated _is_vrt_source computation; the gate and the later VRT branch now share one binding.
  • Dismissed (nit): warning when per-band SCALE/OFFSET disagree. The band-0 fallback is already documented, the case is uncommon, and emitting warnings from the low-level attrs helper would expand the change without much payoff.

Full xrspatial/geotiff/tests suite is green after the follow-up (6114 passed, 81 skipped, 1 xfailed) with GPU present.

# Conflicts:
#	xrspatial/geotiff/__init__.py
#	xrspatial/geotiff/_backends/dask.py
#	xrspatial/geotiff/tests/read/test_nodata.py
#	xrspatial/geotiff/tests/release_gates/test_stable_features.py
#	xrspatial/geotiff/tests/vrt/test_metadata.py
@brendancol brendancol merged commit c250d90 into main Jun 5, 2026
7 checks passed
@brendancol
Copy link
Copy Markdown
Contributor Author

#2931

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

performance PR touches performance-sensitive code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

geotiff: align open_geotiff parameters with rioxarray's open_rasterio

1 participant