Skip to content

feat: support pyarrow float16 by widening to float on read/write#3590

Open
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:loop/iceberg-python__003
Open

feat: support pyarrow float16 by widening to float on read/write#3590
anxkhn wants to merge 1 commit into
apache:mainfrom
anxkhn:loop/iceberg-python__003

Conversation

@anxkhn

@anxkhn anxkhn commented Jun 30, 2026

Copy link
Copy Markdown

Rationale for this change

PyArrow float16 (halffloat) currently raises UnsupportedPyArrowTypeException during schema conversion, because _ConvertToIceberg.primitive only handles float32 and float64:

>>> import pyarrow as pa
>>> from pyiceberg.io.pyarrow import _ConvertToIceberg, visit_pyarrow
>>> visit_pyarrow(pa.float16(), _ConvertToIceberg())
pyiceberg.exceptions.UnsupportedPyArrowTypeException: Column 'x' has an unsupported type: halffloat

Iceberg has no half-precision float, but float16 -> float32 is lossless: every IEEE 754 half value (including the maximum finite value 65504) is exactly representable in single precision. This mirrors how the same method already widens int8/int16 to IntegerType, and how ArrowProjectionVisitor._cast_if_needed already widens smaller integers up for cross-platform compatibility. Mapping float16 -> FloatType is the float analogue, so float16 columns round-trip instead of erroring.

Changes (pyiceberg/io/pyarrow.py):

  • _ConvertToIceberg.primitive: map pa.float16() -> FloatType().
  • ArrowProjectionVisitor._cast_if_needed: widen smaller float types to the target type on write (parallel to the existing integer-widening branch), so float16 arrays are cast to float32. Narrowing falls through to the existing promote() handling.

No dependency changes; pyproject.toml / uv.lock are untouched and the imports used were already present.

A note on a design choice, deferring to maintainers: widening float16 silently (rather than erroring or gating behind a config flag) follows the existing int8/int16 -> Integer precedent. Happy to gate it behind a config option instead if you'd prefer. The new float-widening branch also makes float32 -> DoubleType actually cast the array (parallel to int widening), so it slightly tightens float promotion in general, not just float16.

Are these changes tested?

Yes:

  • tests/io/test_pyarrow_visitor.py::test_pyarrow_float16_to_iceberg asserts the schema mapping pa.float16() -> FloatType().
  • tests/io/test_pyarrow.py::test__to_requested_schema_float_promotion is parametrized over f16 -> Float, f16 -> Double, and f32 -> Double, asserting both the written PyArrow type and that the values are preserved.

Both pass locally, the surrounding visitor suite and the sibling integer-promotion test still pass, and make lint (ruff, ruff-format, mypy, pydocstyle, codespell, uv-lock) is clean. The integration suite (Docker/Spark) was not run locally.

Are there any user-facing changes?

Yes. PyArrow tables with float16 columns can now be converted/written through PyIceberg (they map to Iceberg float and are stored as float32), where they previously raised UnsupportedPyArrowTypeException. This is purely additive; existing float32/float64 behavior is unchanged.

PyArrow's float16 (halffloat) raised UnsupportedPyArrowTypeException
during schema conversion because _ConvertToIceberg.primitive only
handled float32/float64. Iceberg has no half-precision float, but
float16 -> float32 is lossless, mirroring how int8/int16 already widen
to IntegerType. Map float16 to FloatType, and widen smaller float arrays
to the target type in ArrowProjectionVisitor._cast_if_needed (parallel
to the integer-widening branch) so float16 columns write as float32.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant