Core: Fix data loss in partial variant shredding by dirtysalt · Pull Request #15087 · apache/iceberg

dirtysalt · 2026-01-20T06:27:42Z

When using variantShreddingFunc to partially shred variant fields, unshredded fields were being lost during serialization. The bug was in ShreddedObject constructor where local variable shreddedFields shadowed the instance field this.shreddedFields, causing unshredded fields to be added to the local map instead of the instance field.

This resulted in the binary value field containing only metadata headers without actual field data, causing IndexOutOfBoundsException on read and permanent data loss.

Fix: Changed all references in the problematic code block to use this.shreddedFields explicitly, ensuring unshredded fields are properly preserved in the instance field and serialized correctly.Added test case testPartialShreddingWithShreddedObject that reproduces the exact scenario from issue #15086.

nastra · 2026-01-20T06:59:54Z

@huaxingao @aihuaxu could you guys please take a look?

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java

core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java

core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java

RussellSpitzer · 2026-01-21T16:16:16Z

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java

+      records.add(record);
+    }
+
+    // Shredding function that only shreds the "id" field


This is what i'm not sure about, this shouldn't be relevant to the fix correct? We would lose the unshredded fields even if no transform is applied. The Transform in the writer isn't relevant to the broken serialization method correct?

@RussellSpitzer I may not have enough knowledge about iceberg code, and pardon me if I'm wrong because I just do some tests and observe results, and try to connect them and find reasonable explanation. Maybe my explanation is not the root cause.

I think it's relevant to the fix. The bug is not "we lose the unshredded fields", but is "we will lose shredded fields if we partially shred some fields". If I write the test case like following "shred all fields", then the test case works.

VariantShreddingFunction partialShredding = (id, name) -> { - VariantMetadata shreddedMetadata = Variants.metadata("id"); - ShreddedObject shreddedObject = Variants.object(shreddedMetadata); - shreddedObject.put("id", Variants.of(1234L)); - return ParquetVariantUtil.toParquetSchema(shreddedObject); + if (name.equals("var")) { + ShreddedObject obj = Variants.object(metadata); + obj.put("id", Variants.of(1000L)); + obj.put("name", Variants.of("user")); + obj.put("city", Variants.of("city")); + return ParquetVariantUtil.toParquetSchema(obj); + } + return null; };

But if I just do partially shredding, then exception happens.

Actually, the fix should be following, only to shred the id field in variant. But I guess it does not matter much because id field of primitive type does not support shredding.

--- a/parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java +++ b/parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java @@ -302,13 +302,16 @@ public class TestVariantWriters { records.add(record); } - // Shredding function that only shreds the "id" field + // Shredding function that only shreds the "id" field in the variant VariantShreddingFunction partialShredding = (id, name) -> { - VariantMetadata shreddedMetadata = Variants.metadata("id"); - ShreddedObject shreddedObject = Variants.object(shreddedMetadata); - shreddedObject.put("id", Variants.of(1234L)); - return ParquetVariantUtil.toParquetSchema(shreddedObject); + if (name.equals("var")) { + VariantMetadata shreddedMetadata = Variants.metadata("id"); + ShreddedObject shreddedObject = Variants.object(shreddedMetadata); + shreddedObject.put("id", Variants.of(1234L)); + return ParquetVariantUtil.toParquetSchema(shreddedObject); + } + return null; };

My question was whether -

// Shredding function that only shreds the "id" field VariantShreddingFunction partialShredding = (id, name) -> null; // No Shredding

Would also be broken, but it looks like it isn't from my internal test. So It must be going down a different serialization path I guess

OK I think I get it, the test code is essentially creating a fully shredded object at the top with

obj.put("id", Variants.of(1000L + i)); obj.put("name", Variants.of("user_" + i)); obj.put("city", Variants.of("city_" + i));

Then this code is transforming that object into a partially shredded one

yes, tranformig that object into a partially shredded one.

creating a fully shredded object

I'm not sure if I want to create a fully shredded object at this moment, although the returned type here is indeed called ShreddedObject.

ShreddedObject obj = Variants.object(metadata); obj.put("id", Variants.of(1000L + i)); obj.put("name", Variants.of("user_" + i)); obj.put("city", Variants.of("city_" + i));

I'm not sure this is the right way(or expected way) to construct a variant, or should I construct a SerializedObject like what you wrote in this issue (#15086)

dirtysalt · 2026-01-27T10:43:46Z

@rdblue @RussellSpitzer @huaxingao do you have more comments or is it okay to merge this pr?

amogh-jahagirdar · 2026-01-30T03:24:04Z

Thanks @dirtysalt , I did another pass and the new tests look right to me. I'm going to go ahead and merge. Thank you @aihuaxu @huaxingao @rdblue @RussellSpitzer for reviewing.

fix variant writer

9032acd

github-actions bot added parquet core labels Jan 20, 2026

nastra requested a review from huaxingao January 20, 2026 06:58

huaxingao reviewed Jan 20, 2026

View reviewed changes

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java Outdated Show resolved Hide resolved

huaxingao reviewed Jan 20, 2026

View reviewed changes

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java Outdated Show resolved Hide resolved

huaxingao reviewed Jan 20, 2026

View reviewed changes

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java Outdated Show resolved Hide resolved

dirtysalt added 2 commits January 20, 2026 18:27

fix per comment

5218092

fix java format

04745aa

amogh-jahagirdar reviewed Jan 20, 2026

View reviewed changes

core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java Outdated Show resolved Hide resolved

rdblue reviewed Jan 20, 2026

View reviewed changes

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java Show resolved Hide resolved

rdblue reviewed Jan 20, 2026

View reviewed changes

parquet/src/test/java/org/apache/iceberg/parquet/TestVariantWriters.java Show resolved Hide resolved

dirtysalt added 3 commits January 21, 2026 09:36

fix per comment

00b22dd

use ParquetVairnatUtil to construct schema

24bb7ea

fix java format

f34042a

RussellSpitzer reviewed Jan 21, 2026

View reviewed changes

core/src/main/java/org/apache/iceberg/variants/ShreddedObject.java Show resolved Hide resolved

RussellSpitzer reviewed Jan 21, 2026

View reviewed changes

RussellSpitzer approved these changes Jan 27, 2026

View reviewed changes

amogh-jahagirdar approved these changes Jan 30, 2026

View reviewed changes

amogh-jahagirdar merged commit e40c2d6 into apache:main Jan 30, 2026
32 checks passed

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Core: Fix data loss in partial variant shredding#15087

Core: Fix data loss in partial variant shredding#15087
amogh-jahagirdar merged 6 commits intoapache:mainfrom
dirtysalt:fix-test-variant-writer

dirtysalt commented Jan 20, 2026

Uh oh!

nastra commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer Jan 21, 2026 •

edited

Loading

Uh oh!

dirtysalt Jan 22, 2026

Uh oh!

RussellSpitzer Jan 22, 2026

Uh oh!

RussellSpitzer Jan 22, 2026

Uh oh!

dirtysalt Jan 23, 2026

Uh oh!

dirtysalt commented Jan 27, 2026

Uh oh!

amogh-jahagirdar commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

Conversation

dirtysalt commented Jan 20, 2026

Uh oh!

nastra commented Jan 20, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

RussellSpitzer Jan 21, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

dirtysalt Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

RussellSpitzer Jan 22, 2026

Choose a reason for hiding this comment

Uh oh!

dirtysalt Jan 23, 2026

Choose a reason for hiding this comment

Uh oh!

dirtysalt commented Jan 27, 2026

Uh oh!

amogh-jahagirdar commented Jan 30, 2026

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

7 participants

RussellSpitzer Jan 21, 2026 •

edited

Loading