Skip to content

feat(spark): add concat_ws with array support#20928

Draft
davidlghellin wants to merge 2 commits intoapache:mainfrom
davidlghellin:feat/concat_ws
Draft

feat(spark): add concat_ws with array support#20928
davidlghellin wants to merge 2 commits intoapache:mainfrom
davidlghellin:feat/concat_ws

Conversation

@davidlghellin
Copy link
Contributor

Which issue does this PR close?

Part of #15914

Rationale for this change

DataFusion core's concat_ws does not support array arguments. Spark's concat_ws(sep, ...) accepts both scalar strings and arrays, expanding array elements and skipping nulls. This is needed for Spark compatibility in the datafusion-spark crate.

What changes are included in this PR?

  • New SparkConcatWs UDF in datafusion/spark/src/function/string/concat_ws.rs
    • Supports concat_ws(sep, str1, str2, ...) with scalar strings
    • Supports array arguments: concat_ws(',', array('a', 'b'), 'c')"a,b,c"
    • Null scalars and null array elements are skipped (Spark behavior)
    • Null separator returns NULL
    • Zero value arguments (concat_ws(',')) returns empty string
    • Supports Utf8, LargeUtf8, Utf8View, List, and LargeList types
  • Registered the function in mod.rs (make_udf_function!, export_functions!, functions())
  • Replaced commented-out SLT tests with 14 working test cases covering basic usage, arrays, mixed arguments, nulls, column expressions, and edge cases

Are these changes tested?

Yes.

  • 7 unit tests in concat_ws.rs (basic, null values skipped, null separator, list arrays, list with nulls, mixed scalar+list, multiple rows)
  • 14 SLT tests in spark/string/concat_ws.slt covering scalars, arrays, nulls, column expressions, and edge cases

Are there any user-facing changes?

No. This is a new function in the datafusion-spark crate only.

Copilot AI review requested due to automatic review settings March 13, 2026 16:57
@github-actions github-actions bot added sqllogictest SQL Logic Tests (.slt) spark labels Mar 13, 2026
Copy link
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds Spark-compatible support for concat_ws in the DataFusion Spark shim, including array-argument expansion semantics, and replaces previously-commented SLT examples with executable Spark SQLLogicTest coverage.

Changes:

  • Register new Spark UDF concat_ws in the Spark string function module.
  • Implement SparkConcatWs with support for array arguments and Spark-style null handling.
  • Expand concat_ws.slt into a broader suite of executable tests (scalar, arrays, columns, and edge cases).

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 3 comments.

File Description
datafusion/sqllogictest/test_files/spark/string/concat_ws.slt Converts commented examples into runnable SLT queries and adds broader coverage for Spark concat_ws behavior.
datafusion/spark/src/function/string/mod.rs Registers and exports the new Spark concat_ws UDF.
datafusion/spark/src/function/string/concat_ws.rs New Spark-compatible concat_ws implementation (including array expansion) plus unit tests.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

@davidlghellin davidlghellin marked this pull request as draft March 14, 2026 16:45
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

spark sqllogictest SQL Logic Tests (.slt)

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants