perf: Optimize signum scalar performance with fast path #19871

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

Sign up for GitHub

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Jump to bottom

Open

kumarUjjawal wants to merge 4 commits into apache:main from kumarUjjawal:perf/signum_scalar_path

+96 −32

Contributor

kumarUjjawal commented Jan 18, 2026

Which issue does this PR close?

Part of [EPIC] Optimize performance for slow expressions datafusion-comet#2986

Rationale for this change

The signum function currently converts scalar inputs to arrays before processing, even for single scalar values. This adds unnecessary overhead from array allocation and conversion. Adding a scalar fast path avoids this overhead and improves performance for constant folding and scalar expression evaluation.

What changes are included in this PR?

Added scalar fast path for float32 and float64

Type	Before	After	Speedup
signum_f64_scalar	266 ns	54 ns	4.9x
signum_f32_scalar	263 ns	55 ns	4.8x

Are these changes tested?

Yes

Are there any user-facing changes?

No


          perf: Optimize signum scalar performance with fast path

d528db9

github-actions bot added the functions label

rluvaton reviewed

View reviewed changes

datafusion/functions/src/math/signum.rs Outdated

    
                      }

                      // Array path

                      make_scalar_function(signum, vec![])(&args.args)

Member

rluvaton Jan 18, 2026

this should have handled that optimization, no?

Contributor Author

kumarUjjawal Jan 18, 2026

If my interpretation is correct, you are asking: To add scalar optimization inside make_scalar_function? To do that we would need to change the signature to also accept a scalar function, which would be a larger refactor. If you meant that Doesn't make_scalar_function already handle scalar optimization? Then no we still need to convert scalars to arrays first. We have used the inline path in other parts of the optimization too.

Contributor

Jefffrey Jan 18, 2026

I think we can technically use make_scalar_function with the correct hints, but we might be trying to move away from that function, see:

Review the need of make_scalar_function for functions #14835

Jefffrey reviewed

View reviewed changes

datafusion/functions/src/math/signum.rs Outdated Show resolved Hide resolved

datafusion/functions/src/math/signum.rs Outdated

Comment on lines 103 to 104

    
                      // Scalar fast path for float types - avoid array conversion overhead

                      if let ColumnarValue::Scalar(scalar) = arg {

Contributor

Jefffrey Jan 18, 2026

We don't need to repeat this comment about fast paths each time (not to mention specifying it for "float types" is confusing considering the function signature already limits the inputs to float types). So it can actually be a bit misleading as it might imply we omit fast path for non-float types. We're better off removing the comment.

Contributor Author

kumarUjjawal Jan 19, 2026 •

edited

Loading

Thanks for pointing that out.

datafusion/functions/src/math/signum.rs Outdated

    
                          }

                      }

                      // Array path

Contributor

Jefffrey Jan 18, 2026

We might as well change the if let to a match statement, and inline the contents of signum here to avoid use of make_scalar_function to simplify the code

kumarUjjawal and others added 2 commits

January 18, 2026 22:04


          suggestion from jeffrey

85bef0f

Co-authored-by: Jeffrey Vo <[email protected]>


          remove comments and inline signum function

67eb641

Jefffrey approved these changes

View reviewed changes

datafusion/functions/src/math/signum.rs Outdated

    
                                      },

                                  ),

                              ))),

                              other => exec_err!("Unsupported data type {other:?} for function signum"),

Contributor

Jefffrey Jan 19, 2026

nit: this should be internal error to be consistent with scalar path above

Contributor Author

kumarUjjawal Jan 19, 2026

Can you provide a basic mental model for when I should use exec_err and when internal_err? Is there any documentation for this?

Contributor

Jefffrey Jan 19, 2026

exec err -> things that can happen in normal execution, such as invalid value to a function (e.g. trying to get ascii character from an integer input, and we input a value that doesnt have a corresponding character like 99999)

internal err -> things that shouldn't normally happen, aka occur if some other bug in datafusion allowed this code path to occur

in this case, the signature should already guard us to only have f32/f64 inputs; therefore if at this point we find an array not of that type, then something went wrong in type coercion/signature code and its an internal bug

Contributor Author

kumarUjjawal Jan 19, 2026

Thanks @Jefffrey


          use internal error

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels