Skip to content

[spark] Support batch union read for lake-enabled primary key tables#3042

Open
Yohahaha wants to merge 7 commits intoapache:mainfrom
Yohahaha:batch-union-read-pk
Open

[spark] Support batch union read for lake-enabled primary key tables#3042
Yohahaha wants to merge 7 commits intoapache:mainfrom
Yohahaha:batch-union-read-pk

Conversation

@Yohahaha
Copy link
Copy Markdown
Contributor

@Yohahaha Yohahaha commented Apr 9, 2026

Purpose

Linked issue: close #2984

This PR adds support for reading lake-enabled primary key tables in the spark sql.

Brief change log

  • Move LakeSnapshotAndLogSplitScanner from fluss-flink to fluss-client module for reuse across connectors
  • Refactor LakeSnapshotAndLogSplitScanner to be more generic (decouple from Flink-specific LakeSnapshotAndFlussLogSplit class)
  • Add Spark lake upsert read support
  • Minor refactoring for test files

Tests

  • Added SparkLakePrimaryKeyTableReadTestBase with paimon.

API and Format

No API or format changes.

Documentation

No new feature documentation required (extends existing lake reading capability).

@Yohahaha Yohahaha changed the title [spark] Support reading lake-enabled primary key tables in Spark connector [spark] Support reading lake-enabled primary key tables Apr 9, 2026
@Yohahaha Yohahaha changed the title [spark] Support reading lake-enabled primary key tables [spark] Support batch union read for lake-enabled primary key tables Apr 9, 2026
@Yohahaha
Copy link
Copy Markdown
Contributor Author

CI failed with known issue #2992

@Yohahaha
Copy link
Copy Markdown
Contributor Author

@YannByron would you help review this pr?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[spark] Batch union read for pk table

1 participant