Skip to content

Conversation

@kumarUjjawal
Copy link
Contributor

Which issue does this PR close?

Rationale for this change

The signum function currently converts scalar inputs to arrays before processing, even for single scalar values. This adds unnecessary overhead from array allocation and conversion. Adding a scalar fast path avoids this overhead and improves performance for constant folding and scalar expression evaluation.

What changes are included in this PR?

  • Added scalar fast path for float32 and float64
Type Before After Speedup
signum_f64_scalar 266 ns 54 ns 4.9x
signum_f32_scalar 263 ns 55 ns 4.8x

Are these changes tested?

Yes

Are there any user-facing changes?

No

@github-actions github-actions bot added the functions Changes to functions implementation label Jan 18, 2026
}

// Array path
make_scalar_function(signum, vec![])(&args.args)
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

this should have handled that optimization, no?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If my interpretation is correct, you are asking: To add scalar optimization inside make_scalar_function? To do that we would need to change the signature to also accept a scalar function, which would be a larger refactor. If you meant that Doesn't make_scalar_function already handle scalar optimization? Then no we still need to convert scalars to arrays first. We have used the inline path in other parts of the optimization too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can technically use make_scalar_function with the correct hints, but we might be trying to move away from that function, see:

Comment on lines 103 to 104
// Scalar fast path for float types - avoid array conversion overhead
if let ColumnarValue::Scalar(scalar) = arg {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't need to repeat this comment about fast paths each time (not to mention specifying it for "float types" is confusing considering the function signature already limits the inputs to float types). So it can actually be a bit misleading as it might imply we omit fast path for non-float types. We're better off removing the comment.

Copy link
Contributor Author

@kumarUjjawal kumarUjjawal Jan 19, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing that out.

}
}

// Array path
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might as well change the if let to a match statement, and inline the contents of signum here to avoid use of make_scalar_function to simplify the code

},
),
))),
other => exec_err!("Unsupported data type {other:?} for function signum"),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: this should be internal error to be consistent with scalar path above

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you provide a basic mental model for when I should use exec_err and when internal_err? Is there any documentation for this?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

exec err -> things that can happen in normal execution, such as invalid value to a function (e.g. trying to get ascii character from an integer input, and we input a value that doesnt have a corresponding character like 99999)

internal err -> things that shouldn't normally happen, aka occur if some other bug in datafusion allowed this code path to occur

in this case, the signature should already guard us to only have f32/f64 inputs; therefore if at this point we find an array not of that type, then something went wrong in type coercion/signature code and its an internal bug

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks @Jefffrey

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

functions Changes to functions implementation

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants