Skip to content

Change bigquery to compileonly#15655

Merged
rdblue merged 1 commit intoapache:mainfrom
rambleraptor:fix-dependencies
Apr 1, 2026
Merged

Change bigquery to compileonly#15655
rdblue merged 1 commit intoapache:mainfrom
rambleraptor:fix-dependencies

Conversation

@rambleraptor
Copy link
Copy Markdown
Contributor

This fixes the transitive dependency issue BigQuery is facing

@github-actions github-actions bot added the build label Mar 16, 2026
@kevinjqliu
Copy link
Copy Markdown
Contributor

is there a way to validate this? similar to ryans comment here from the original pr #14221 (comment)

implementation "com.google.cloud:google-cloud-bigquery"
implementation "com.google.cloud:google-cloud-core"
compileOnly "com.google.cloud:google-cloud-bigquery"
compileOnly "com.google.cloud:google-cloud-core"
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you confirm that this works with the current Google cloud bundle that is produced?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I did some sanity testing:

  • Ran Spark with iceberg-bigquery,iceberg-spark-runtime to query a BigQuery table. This failed since it required dependencies from iceberg-gcp-bundle (for authentication / others).
  • Ran Spark with iceberg-bigquery,iceberg-spark-runtime,iceberg-gcp-bundle. Worked as expected.

This is telling me the bundles are taking on the dependencies we expect them to.

@rambleraptor
Copy link
Copy Markdown
Contributor Author

@kevinjqliu I'm mostly doing sanity testing by making queries with/without the iceberg-gcp-bundle and expecting failures. If we can query BigQuery without including iceberg-gcp-bundle, it means we're taking on dependencies from that bundle unexpectedly.

@rambleraptor rambleraptor requested a review from rdblue March 27, 2026 17:56
@jbonofre
Copy link
Copy Markdown
Member

This PR changes what is "shaded" in artifacts (like spark-runtime, etc).

I have to check and update the LICENSE/ NOTICE in artifacts. I can do that in a follow up PR.

@jbonofre jbonofre self-requested a review March 31, 2026 14:59
@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Apr 1, 2026

@rambleraptor, can you please dump the dependencies that were shaded into the runtime Jars before this was added and after this commit so we can compare them? If we can't compare them, I'll open a PR to remove the new GCP Jar from the runtime bundles.

@rdblue rdblue added this to the Iceberg 1.11.0 milestone Apr 1, 2026
@RussellSpitzer
Copy link
Copy Markdown
Member

RussellSpitzer commented Apr 1, 2026

Asked Cursor to check out the state of the runtime jars at three different time points
before we did the bigquery change, before this pr, and after this pr

No-bigquery is the runtime jar built without the iceberg-bigquery module at all


Three-Way JAR Comparison (Spark 4.1 Runtime)

No-BigQuery Pre-#15655 (current) Post-#15655 (fixed)
Size 50 MB 71 MB 50 MB
Files 29,011 37,015 29,019
Delta vs No-BigQuery baseline +21 MB / +8,004 files +8 files
MD5 ed7fe76911385b2bb71d0786bae6d057 1ab287ee7c480f176b8304fda8d2e3d6 8a886148d3218df942e99b52be063183

No-BigQuery vs Post-#15655: 8 New Files

After PR #15655, the only difference from the pre-BigQuery baseline is the iceberg-bigquery module's own compiled classes:

org/apache/iceberg/gcp/bigquery/
org/apache/iceberg/gcp/bigquery/BigQueryMetastoreCatalog.class
org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClient.class
org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl$1.class
org/apache/iceberg/gcp/bigquery/BigQueryMetastoreClientImpl.class
org/apache/iceberg/gcp/bigquery/BigQueryMetastoreUtils.class 
org/apache/iceberg/gcp/bigquery/BigQueryProperties.class
org/apache/iceberg/gcp/bigquery/BigQueryTableOperations.class
   

Script: https://gist.github.com/RussellSpitzer/db182c5d68943de0d5c11a790aad9f7b
Output: https://gist.github.com/RussellSpitzer/9ef19b088343225692c904d78651e39d

@rambleraptor
Copy link
Copy Markdown
Contributor Author

@RussellSpitzer that's nifty! Fun use of cursor.

I dumped the list of dependencies before / after this PR into this doc. The list of dependencies after this PR goes down dramatically.

@rambleraptor
Copy link
Copy Markdown
Contributor Author

Here's a Gist showing the iceberg-bigquery JAR before/after this change along with the diff.

@rambleraptor
Copy link
Copy Markdown
Contributor Author

rambleraptor commented Apr 1, 2026

https://gist.github.com/rambleraptor/37092562e194a7c38220d72ab05bbc7c

I ran the dependency tree for Spark 4.0-Scala 2.13 on commit 5c6629e (parent of #14221) against Spark 4.0-Spark 2.13 on this PR. They appear to be the same (minus some version bumps), other than the addition of com.aliyun:credentials-java.

@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Apr 1, 2026

Thanks for posting the dependency validation. I'll follow up with a PR to fix Aliyun.

@rdblue rdblue merged commit ff298a6 into apache:main Apr 1, 2026
35 checks passed
@rdblue
Copy link
Copy Markdown
Contributor

rdblue commented Apr 1, 2026

Thanks, @rambleraptor!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants