Conversation
|
is there a way to validate this? similar to ryans comment here from the original pr #14221 (comment) |
| implementation "com.google.cloud:google-cloud-bigquery" | ||
| implementation "com.google.cloud:google-cloud-core" | ||
| compileOnly "com.google.cloud:google-cloud-bigquery" | ||
| compileOnly "com.google.cloud:google-cloud-core" |
There was a problem hiding this comment.
Can you confirm that this works with the current Google cloud bundle that is produced?
There was a problem hiding this comment.
I did some sanity testing:
- Ran Spark with
iceberg-bigquery,iceberg-spark-runtimeto query a BigQuery table. This failed since it required dependencies fromiceberg-gcp-bundle(for authentication / others). - Ran Spark with
iceberg-bigquery,iceberg-spark-runtime,iceberg-gcp-bundle. Worked as expected.
This is telling me the bundles are taking on the dependencies we expect them to.
|
@kevinjqliu I'm mostly doing sanity testing by making queries with/without the iceberg-gcp-bundle and expecting failures. If we can query BigQuery without including |
|
This PR changes what is "shaded" in artifacts (like spark-runtime, etc). I have to check and update the |
|
@rambleraptor, can you please dump the dependencies that were shaded into the runtime Jars before this was added and after this commit so we can compare them? If we can't compare them, I'll open a PR to remove the new GCP Jar from the runtime bundles. |
|
Asked Cursor to check out the state of the runtime jars at three different time points No-bigquery is the runtime jar built without the iceberg-bigquery module at all Three-Way JAR Comparison (Spark 4.1 Runtime)
No-BigQuery vs Post-#15655: 8 New FilesAfter PR #15655, the only difference from the pre-BigQuery baseline is the Script: https://gist.github.com/RussellSpitzer/db182c5d68943de0d5c11a790aad9f7b |
|
@RussellSpitzer that's nifty! Fun use of cursor. I dumped the list of dependencies before / after this PR into this doc. The list of dependencies after this PR goes down dramatically. |
|
Here's a Gist showing the iceberg-bigquery JAR before/after this change along with the diff. |
|
https://gist.github.com/rambleraptor/37092562e194a7c38220d72ab05bbc7c I ran the dependency tree for Spark 4.0-Scala 2.13 on commit 5c6629e (parent of #14221) against Spark 4.0-Spark 2.13 on this PR. They appear to be the same (minus some version bumps), other than the addition of |
|
Thanks for posting the dependency validation. I'll follow up with a PR to fix Aliyun. |
|
Thanks, @rambleraptor! |
This fixes the transitive dependency issue BigQuery is facing