Skip to content

[SPARK-56328][SQL] Fix inline table collation handling for INSERT VALUES and DEFAULT COLLATION#55160

Open
ilicmarkodb wants to merge 1 commit intoapache:masterfrom
ilicmarkodb:inline_talbe_collation
Open

[SPARK-56328][SQL] Fix inline table collation handling for INSERT VALUES and DEFAULT COLLATION#55160
ilicmarkodb wants to merge 1 commit intoapache:masterfrom
ilicmarkodb:inline_talbe_collation

Conversation

@ilicmarkodb
Copy link
Copy Markdown
Contributor

@ilicmarkodb ilicmarkodb commented Apr 2, 2026

What changes were proposed in this pull request?

This PR fixes two related issues with how collations interact with inline tables (VALUES clauses):

1. Eager evaluation bypasses DEFAULT COLLATION for CREATE TABLE/VIEW

Inline tables are eagerly evaluated during parsing for performance. But when inside CREATE TABLE ... DEFAULT COLLATION UTF8_LCASE AS SELECT * FROM VALUES ('a') AS T(c1), the default collation must be applied to string literals during analysis. Since eager evaluation happens before analysis, the collation was lost.

The fix adds canEagerlyEvaluateInlineTable which prevents eager evaluation when the inline table is inside a CREATE TABLE/VIEW statement and contains string literals that need collation resolution.

2. INSERT INTO VALUES fails with INCOMPATIBLE_TYPES_IN_INLINE_TABLE for collated columns

When using INSERT INTO ... VALUES with collated columns, the inline table resolution could fail because values in the same column end up with different collations. This happens when:

  • ResolveColumnDefaultInCommandInputQuery resolves DEFAULT to a typed null with the target column's collation, which differs from other literals' collation
  • Explicit COLLATE on values produces mismatched collations across rows

The fix adds an ignoreCollation parameter to EvaluateUnresolvedInlineTable that strips collations from input types before finding the common type. This is safe for INSERT because the INSERT coercion will cast each value to the target column's type, including collation.

The collation stripping is applied only when the inline table is the direct VALUES clause of an INSERT statement:

  • Parser path: isInlineTableInsideInsertValuesClause walks up the parser context tree to detect INSERT INTO t VALUES (...) vs INSERT INTO t SELECT * FROM VALUES (...) AS T
  • Analyzer path: ResolveInlineTables pattern-matches InsertIntoStatement with a direct UnresolvedInlineTable query child

Standalone SELECT * FROM VALUES (...) and CTAS with conflicting explicit collations continue to fail as expected.

Why are the changes needed?

Without this fix:

-- Fails: DEFAULT COLLATION not applied to inline table literals
CREATE TABLE t DEFAULT COLLATION UTF8_LCASE AS
  SELECT * FROM VALUES ('a'), ('b') AS T(c1) WHERE c1 = 'A';
-- Column c1 gets UTF8_BINARY instead of UTF8_LCASE

-- Fails: INCOMPATIBLE_TYPES_IN_INLINE_TABLE
CREATE TABLE t (c1 STRING COLLATE UTF8_LCASE, c2 STRING COLLATE UTF8_LCASE);
INSERT INTO t VALUES ('a', DEFAULT), (DEFAULT, DEFAULT);

Does this PR introduce any user-facing change?

Yes.

  • CREATE TABLE/VIEW ... DEFAULT COLLATION ... AS SELECT * FROM VALUES (...) now correctly applies the default collation to inline table literals.
  • INSERT INTO ... VALUES with collated columns now succeeds in cases that previously failed with INCOMPATIBLE_TYPES_IN_INLINE_TABLE.

How was this patch tested?

New tests in CollationSuite covering both eager and non-eager evaluation paths (EAGER_EVAL_OF_UNRESOLVED_INLINE_TABLE_ENABLED = true/false):

  • INSERT VALUES with NULLs, DEFAULT, explicit conflicting collations, mixed collations, nested types (ARRAY)
  • INSERT OVERWRITE with collated values
  • Collation stripping does not affect expression evaluation
  • Negative tests: standalone SELECT VALUES, INSERT SELECT FROM VALUES, and CTAS with conflicting collations still fail

New single-column inline table test variants in DefaultCollationTestSuite for CTAS and CREATE VIEW with DEFAULT COLLATION.

Was this patch authored or co-authored using generative AI tooling?

Generated-by: Claude Code

@ilicmarkodb ilicmarkodb changed the title temp [SPARK-56328][SQL] Fix INCOMPATIBLE_TYPES_IN_INLINE_TABLE for INSERT INTO VALUES with collated columns Apr 2, 2026
@ilicmarkodb ilicmarkodb changed the title [SPARK-56328][SQL] Fix INCOMPATIBLE_TYPES_IN_INLINE_TABLE for INSERT INTO VALUES with collated columns [SPARK-56328][SQL] Fix inline table collation handling for INSERT VALUES and DEFAULT COLLATION Apr 2, 2026
@ilicmarkodb ilicmarkodb force-pushed the inline_talbe_collation branch 2 times, most recently from bc4dc9d to 0f5e5ed Compare April 2, 2026 17:01
@ilicmarkodb ilicmarkodb force-pushed the inline_talbe_collation branch from 0f5e5ed to b173882 Compare April 2, 2026 22:10
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant