[SPARK-56332][SQL][TESTS] Use `sql.SparkSession` in `trait SQLTestData` by zhengruifeng · Pull Request #55162 · apache/spark

zhengruifeng · 2026-04-02T11:52:45Z

What changes were proposed in this pull request?

Use sql.SparkSession in trait SQLTestData

Why are the changes needed?

this is needed for merging SQLTestUtils and QueryTest

Does this PR introduce any user-facing change?

No, test-only

How was this patch tested?

CI

Was this patch authored or co-authored using generative AI tooling?

Co-authored-by: Claude code (Opus 4.6)

… SQLTestData Use `sql.SparkSession` instead of `classic.SparkSession` in `SQLTestData`. For datasets that require RDD-based creation (emptyTestData, testData, testData2, testData3, upperCaseData, lowerCaseData, lowerCaseDataWithDuplicates), cast to `classic.SparkSession`. For all other datasets, use `spark.createDataFrame` directly. Co-authored-by: Isaac

Replace `.toDF()` on RDDs with `spark.createDataFrame(rdd)` and `$"..."` with `col("...")` to eliminate the SQLImplicits dependency. Co-authored-by: Isaac

This reverts commit 1cf5903.

Replace `.toDF()` on RDDs with `spark.createDataFrame(rdd)` and `$"..."` with `col("...")` to eliminate the SQLImplicits dependency. Co-authored-by: Isaac

hvanhovell · 2026-04-02T15:29:18Z

sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala

  protected lazy val emptyTestData: DataFrame = {
-    val df = spark.sparkContext.parallelize(
-      Seq.empty[Int].map(i => TestData(i, i.toString))).toDF()
+    val df = spark.createDataFrame(


Can you please avoid using SparkContext.parallelize?

val df = spark.emptyDataset[TestData].toDF()

hvanhovell · 2026-04-02T17:51:17Z

sql/core/src/test/scala/org/apache/spark/sql/test/SQLTestData.scala

-    val df = spark.sparkContext.parallelize(
-      (1 to 100).map(i => TestData(i, i.toString))).toDF()
+    val df = spark.createDataFrame(
+      spark.asInstanceOf[classic.SparkSession].sparkContext.parallelize(


zhengruifeng added 4 commits April 2, 2026 03:50

Remove internalImplicits from SQLTestData

1cf5903

Replace `.toDF()` on RDDs with `spark.createDataFrame(rdd)` and `$"..."` with `col("...")` to eliminate the SQLImplicits dependency. Co-authored-by: Isaac

Revert "Remove internalImplicits from SQLTestData"

5778cc2

This reverts commit 1cf5903.

Remove internalImplicits from SQLTestData

447d4e6

Replace `.toDF()` on RDDs with `spark.createDataFrame(rdd)` and `$"..."` with `col("...")` to eliminate the SQLImplicits dependency. Co-authored-by: Isaac

zhengruifeng requested review from cloud-fan and dongjoon-hyun April 2, 2026 11:53

hvanhovell reviewed Apr 2, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[SPARK-56332][SQL][TESTS] Use `sql.SparkSession` in `trait SQLTestData`#55162

[SPARK-56332][SQL][TESTS] Use `sql.SparkSession` in `trait SQLTestData`#55162
zhengruifeng wants to merge 4 commits intoapache:masterfrom
zhengruifeng:merge_sql_utils

zhengruifeng commented Apr 2, 2026

Uh oh!

hvanhovell Apr 2, 2026

Uh oh!

hvanhovell Apr 2, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

Conversation

zhengruifeng commented Apr 2, 2026

What changes were proposed in this pull request?

Why are the changes needed?

Does this PR introduce any user-facing change?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

hvanhovell Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

hvanhovell Apr 2, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants