Skip to content

Conversation

@coderfender
Copy link
Contributor

@coderfender coderfender commented Feb 11, 2026

Which issue does this PR close?

Part 1 of #3459

Rationale for this change

Cast module is currently 3700 LOC and the goal is to refactor this into separate files based on datatype

What changes are included in this PR?

Functional changes :

  1. We had a bug where can_cast_to_boolean would not support cast to Utf8 but the matches macro did. This was mostly due to code duplication among various modules. Also
  2. Identity cast support miss : Although we check if the cast is an identity op in previous parts of code, I made sure that we support identity casts in can_cast_from_boolean to make it consistent

Refactoring Changes

(From) Boolean refactor:

Code :

  1. boolean.rs: can_cast_from_boolean(), cast_boolean_to_decimal(), and unit tests (Since all casts except to decimal are datafusion compatible)

Unit tests:

  1. Created new unit tests to cast from boolean to new data types

Benchmarks:

Created new benchmarks to assess performance

Utils.rs

  1. moved is_identity_cast() [NEW] , spark_cast_postprocess(), cast_overflow(), invalid_value() to utils.rs

How are these changes tested?

@coderfender
Copy link
Contributor Author

@andygrove , @comphead , This is the first part of refactoring out cast ops by data type (and move common functions to utilities). I also went ahead and added tests and benchmarks. Subsequent PRs will follow a similar approach with refactor and all the cast ops are static dispatch with room for additional standardization in the future

@coderfender
Copy link
Contributor Author

CI passed. I believe the PR is ready for your review @andygrove , @comphead

@coderfender
Copy link
Contributor Author

Rebased with main and moved cast_boolean_to_decimal to boolean cast file

@andygrove
Copy link
Member

Thanks @coderfender. Could you fill in the PR description to cover what is included in this PR, since it looks like a combination of functional changes, refactoring, tests, and benchmarks. I am wondering if would be better to just do the refactoring in this PR and then follow up with a separate PR for the functional changes?

@coderfender
Copy link
Contributor Author

coderfender commented Feb 12, 2026

@andygrove , thank you for the review.
The Utf8 addition to can_cast_from_boolean changes cast_supported's return value (to true) for Boolean→String. However, cast_supported is not called anywhere on main except recursive calls for structs which could have been the reason this bug was not surfaced in the first place . This addition makes it consistent with is_datafusion_spark_compatible macro both of which already support Boolean -> String

Flow on main branch :

  1. is_datafusion_spark_compatible -> matches! macro (for simple types)
  2. can_cast_from_boolean function is only called for struct / recursive casts (which missed cast support to Utf8)

Feature

  1. can_cast_from_boolean acts as a single source of truth (with identity cast support) for all casts from boolean type

Let me create a new PR just for the functional changes / bug fixes to keep changes isolated. However, it might be easier to first merge the functional changes and then continue refactoring IMO

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants