fix: Avoid precision loss for atan2 with integer args#22516
Conversation
|
Would we see similar issues for other math functions, meaning if we follow this approach we'd have to change math functions to operate predominantly on f64? |
We wouldn't. The main thing we need to avoid is using f32 for integer inputs. We can either do that by just removing f32 support, or ensuring that we use the f64 path for integer inputs. This PR initially did the former, but on closer examination, I see that quite a few DataFusion numerical functions have a float32 code path. So it is probably more consistent to switch to the second approach, which I've now done. |
alamb
left a comment
There was a problem hiding this comment.
Makes sense to me -- thank you @neilconway
| &self.signature | ||
| } | ||
|
|
||
| fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> { |
There was a problem hiding this comment.
we could simplify as so too
fn return_type(&self, arg_types: &[DataType]) -> Result<DataType> {
Ok(arg_types[0].clone())
}since args are already coerced by this point
There was a problem hiding this comment.
Personally I'd prefer to be explicit, rather than depending on the coupling / timing dependency between type coercion and return_type. There's enough subtle behavior in this code as it is...
Which issue does this PR close?
atan2precision loss with integer arguments #22514.Rationale for this change
atan2defined two input signatures:(Float32, Float32)and(Float64, Float64)(in that order). That meant that integer inputs were coerced intoFloat32values, which lead to incorrect results:atan2(1, 1000000)resulted in less precision thanatan2(1.0, 1000000.0); the results for the former were also inconsistent with the behavior ofatan2in Postgres and DuckDB.Fix this by only using the
Float32path when given twoFloat32inputs; for other inputs, we should useFloat64. This avoids rounding for large integer inputs (Float32has only 24 mantissa bits, so larger integers would get rounded).What changes are included in this PR?
atan2signature to only take theFloat32code path for twoFloat32inputsAre these changes tested?
Yes, new test added.
Are there any user-facing changes?
Yes: the return type and semantics of
atan2in some circumstances has changed.atan2will now only be computed inFloat32when passed twoFloat32values. In all other cases, the computation will be done inFloat64and aFloat64value will be returned.