-
Notifications
You must be signed in to change notification settings - Fork 1.9k
perf: Optimize signum scalar performance with fast path #19871
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
| } | ||
|
|
||
| // Array path | ||
| make_scalar_function(signum, vec![])(&args.args) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
this should have handled that optimization, no?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If my interpretation is correct, you are asking: To add scalar optimization inside make_scalar_function? To do that we would need to change the signature to also accept a scalar function, which would be a larger refactor. If you meant that Doesn't make_scalar_function already handle scalar optimization? Then no we still need to convert scalars to arrays first. We have used the inline path in other parts of the optimization too.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think we can technically use make_scalar_function with the correct hints, but we might be trying to move away from that function, see:
| // Scalar fast path for float types - avoid array conversion overhead | ||
| if let ColumnarValue::Scalar(scalar) = arg { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We don't need to repeat this comment about fast paths each time (not to mention specifying it for "float types" is confusing considering the function signature already limits the inputs to float types). So it can actually be a bit misleading as it might imply we omit fast path for non-float types. We're better off removing the comment.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for pointing that out.
| } | ||
| } | ||
|
|
||
| // Array path |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
We might as well change the if let to a match statement, and inline the contents of signum here to avoid use of make_scalar_function to simplify the code
Co-authored-by: Jeffrey Vo <[email protected]>
| }, | ||
| ), | ||
| ))), | ||
| other => exec_err!("Unsupported data type {other:?} for function signum"), |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: this should be internal error to be consistent with scalar path above
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you provide a basic mental model for when I should use exec_err and when internal_err? Is there any documentation for this?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
exec err -> things that can happen in normal execution, such as invalid value to a function (e.g. trying to get ascii character from an integer input, and we input a value that doesnt have a corresponding character like 99999)
internal err -> things that shouldn't normally happen, aka occur if some other bug in datafusion allowed this code path to occur
in this case, the signature should already guard us to only have f32/f64 inputs; therefore if at this point we find an array not of that type, then something went wrong in type coercion/signature code and its an internal bug
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks @Jefffrey
Which issue does this PR close?
Rationale for this change
The signum function currently converts scalar inputs to arrays before processing, even for single scalar values. This adds unnecessary overhead from array allocation and conversion. Adding a scalar fast path avoids this overhead and improves performance for constant folding and scalar expression evaluation.
What changes are included in this PR?
float32andfloat64Are these changes tested?
Yes
Are there any user-facing changes?
No