Skip to content

Add Apache Iceberg load and export support#236

Merged
vustef merged 46 commits intomainfrom
mjs-add-iceberg-export
Apr 1, 2026
Merged

Add Apache Iceberg load and export support#236
vustef merged 46 commits intomainfrom
mjs-add-iceberg-export

Conversation

@mjschleich
Copy link
Copy Markdown
Contributor

@mjschleich mjschleich commented Mar 19, 2026

Summary

  • Add ExportIcebergConfig message to transactions.proto with fields: locator, config (warehouse/token/credential), schema, prefix, target_file_size_bytes, and compression
  • For load support, we added IcebergData node, reusing GNFColumns from CSVData.
  • Extend the Export oneof to support Iceberg alongside CSV
  • Add grammar rules, helper functions, and test coverage for the new export type across all three SDKs (Python, Go, Julia)

Test plan

  • make protobuf — regenerated bindings for Python, Go, Julia
  • make parsers / make printers — regenerated from grammar
  • make update-bins / make update-snapshots — test fixtures updated
  • make test — all tests pass (Python 686, Julia 31509, Go ok)

Add ExportIcebergConfig to the Export oneof in transactions.proto with
fields: catalog_uri, namespace, table_name, catalog_properties
(warehouse, token, credential), schema, prefix, target_file_size_bytes,
and compression. Grammar rules, parsers, printers, and tests are
included for all three SDKs (Python, Go, Julia).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@mjschleich mjschleich marked this pull request as draft March 19, 2026 21:16
@vustef vustef changed the title Add Apache Iceberg export support Add Apache Iceberg load and export support Mar 26, 2026
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reviewers, I'm not sure if this file is supposed to be generated...

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's not (yet). We do want to extend ProtoBuf.jl to generate hash and equality as well, but this is not supported yet.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

# or appear in hashed structures (e.g. Dict keys, caches).
Base.:(==)(a::IcebergConfig, b::IcebergConfig) =
a.catalog_uri == b.catalog_uri && a.scope == b.scope && a.properties == b.properties
Base.hash(a::IcebergConfig, h::UInt) = hash(a.properties, hash(a.scope, hash(a.catalog_uri, h)))
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@reviewers: I omitted auth_properties from hash and ==, thinking they shouldn't define object's identity. But this then influences that tests don't take this into account either. Not sure if this is fine, nor required.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@comnik please check this question too

Copy link
Copy Markdown
Contributor Author

@mjschleich mjschleich left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Generally this looks good to me, just noticed two small issues

Comment on lines +129 to +131
RelationId table_def = 3;
repeated ExportIcebergColumn columns = 4;
optional string prefix = 5;
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

table_def and prefix may need a comment explaining what they are for

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@gbrgr what is prefix?

Copy link
Copy Markdown

@gbrgr gbrgr Mar 31, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's just the default file name prefix used for parquet files during exports. Edit: maybe rename it to file_prefix.

vustef added 6 commits March 30, 2026 15:26
…dtrip tests pass but only because we use *** as values in the tests (meaning we don't necesarily see whether we obfuscate values of auth_properties or not)
@vustef vustef requested a review from comnik March 30, 2026 18:13
@vustef vustef enabled auto-merge (squash) April 1, 2026 08:20
@vustef vustef merged commit 34d9ca2 into main Apr 1, 2026
5 checks passed
@vustef vustef deleted the mjs-add-iceberg-export branch April 1, 2026 08:20
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants