Skip to content

Core: Add total size fields for position and equality delete files in PartitionsTable#14820

Open
xxubai wants to merge 10 commits intoapache:mainfrom
xxubai:delete-file-size-in-partitions-table
Open

Core: Add total size fields for position and equality delete files in PartitionsTable#14820
xxubai wants to merge 10 commits intoapache:mainfrom
xxubai:delete-file-size-in-partitions-table

Conversation

@xxubai
Copy link
Copy Markdown
Contributor

@xxubai xxubai commented Dec 11, 2025

close #14803

This PR adds total_position_delete_file_size_in_bytes and total_equality_delete_file_size_in_bytes to Partitions Table

@xxubai xxubai requested review from ebyhr and huaxingao December 23, 2025 05:23
@github-actions
Copy link
Copy Markdown

This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions.

@github-actions github-actions bot added the stale label Jan 23, 2026
@xxubai
Copy link
Copy Markdown
Contributor Author

xxubai commented Jan 23, 2026

Hi @ebyhr @huaxingao .
The code has been updated. I’d appreciate a review when you have a chance. Thanks!

Copy link
Copy Markdown
Contributor

@huaxingao huaxingao left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@huaxingao
Copy link
Copy Markdown
Contributor

cc @szehon-ho Could you please double check this PR if you have a min?

| {20211002, 11} | 0 | 4 | 3 | 500 | 1 | 1 | 0 | 0 | 1633172537358000 | 867027598972211003 |
| {20211001, 10} | 0 | 7 | 4 | 700 | 0 | 0 | 0 | 0 | 1633082598716000 | 3280122546965981531 |
| {20211002, 10} | 0 | 3 | 2 | 400 | 0 | 0 | 1 | 1 | 1633169159489000 | 6941468797545315876 |
| partition | spec_id | record_count | file_count | total_data_file_size_in_bytes | position_delete_record_count | position_delete_file_count | total_position_delete_file_size_in_bytes | equality_delete_record_count | equality_delete_file_count | total_equality_delete_file_size_in_bytes | last_updated_at(μs) | last_updated_snapshot_id |
Copy link
Copy Markdown
Member

@ebyhr ebyhr Jan 23, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does Flink expose PartitionsTable as-is? I'm asking this question because this PR doesn't contain any changes in Flink module.

Can we update Flink tests if this PR affects the module?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@ebyhr The update is ready. Please take a look.

@szehon-ho
Copy link
Copy Markdown
Member

changes look fine.

I agree with @ebyhr , worth to check flink side.

Also, how about V3, it doesn't take into account anything about DV's right?

@github-actions github-actions bot removed the stale label Jan 24, 2026
Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM as well

+1 to double check to @ebyhr's point if possible upgrade flink too (if any changes required)

Also, how about V3, it doesn't take into account anything about DV's right?

my understanding is FileContet.POSITION_DELETE applies for both v2 and v3 deletes

Comment on lines +1716 to +1717
totalDeleteFileSizeInBytes(
table.snapshot(posDeleteCommitId).addedDeleteFiles(table.io()),
Copy link
Copy Markdown
Contributor

@singhpk234 singhpk234 Jan 24, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: can we compute the iterator once and reuse it in both position / equality delete

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since other data files will also be iterated over multiple times, I believe a limited amount of redundancy is acceptable.

@github-actions github-actions bot added the flink label Feb 10, 2026
… PartitionsTable

Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
…tions table

Signed-off-by: xuba <xuba@cisco.com>
@xxubai xxubai force-pushed the delete-file-size-in-partitions-table branch from b661ec9 to ec90f6c Compare February 10, 2026 05:14
Signed-off-by: xuba <xuba@cisco.com>
@xxubai
Copy link
Copy Markdown
Contributor Author

xxubai commented Mar 6, 2026

Hi @singhpk234 @szehon-ho @ebyhr
The code has been updated. I’d appreciate a review when you have a chance again. Thanks!

@szehon-ho
Copy link
Copy Markdown
Member

sorry for late reply. I didnt realize it also works for V3 DV's, can we add positive tests for V2/V3 tables to make sure it works (I think currently tests just assert 0L)

@xxubai
Copy link
Copy Markdown
Contributor Author

xxubai commented Mar 17, 2026

sorry for late reply. I didnt realize it also works for V3 DV's, can we add positive tests for V2/V3 tables to make sure it works (I think currently tests just assert 0L)

Hi @szehon-ho . I added a unit test to verify the delete file status in the partitions metadata table after writing to a V3 table with delete vectors. Could you please check whether it looks correct?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add total file size for delete files to partitions table

5 participants