Core: Add total size fields for position and equality delete files in PartitionsTable#14820
Core: Add total size fields for position and equality delete files in PartitionsTable#14820xxubai wants to merge 10 commits intoapache:mainfrom
Conversation
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java
Outdated
Show resolved
Hide resolved
spark/v3.4/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java
Show resolved
Hide resolved
spark/v4.0/spark/src/test/java/org/apache/iceberg/spark/source/TestIcebergSourceTablesBase.java
Show resolved
Hide resolved
|
This pull request has been marked as stale due to 30 days of inactivity. It will be closed in 1 week if no further activity occurs. If you think that’s incorrect or this pull request requires a review, please simply write any comment. If closed, you can revive the PR at any time and @mention a reviewer or discuss it on the dev@iceberg.apache.org list. Thank you for your contributions. |
|
Hi @ebyhr @huaxingao . |
|
cc @szehon-ho Could you please double check this PR if you have a min? |
| | {20211002, 11} | 0 | 4 | 3 | 500 | 1 | 1 | 0 | 0 | 1633172537358000 | 867027598972211003 | | ||
| | {20211001, 10} | 0 | 7 | 4 | 700 | 0 | 0 | 0 | 0 | 1633082598716000 | 3280122546965981531 | | ||
| | {20211002, 10} | 0 | 3 | 2 | 400 | 0 | 0 | 1 | 1 | 1633169159489000 | 6941468797545315876 | | ||
| | partition | spec_id | record_count | file_count | total_data_file_size_in_bytes | position_delete_record_count | position_delete_file_count | total_position_delete_file_size_in_bytes | equality_delete_record_count | equality_delete_file_count | total_equality_delete_file_size_in_bytes | last_updated_at(μs) | last_updated_snapshot_id | |
There was a problem hiding this comment.
Does Flink expose PartitionsTable as-is? I'm asking this question because this PR doesn't contain any changes in Flink module.
Can we update Flink tests if this PR affects the module?
There was a problem hiding this comment.
@ebyhr The update is ready. Please take a look.
|
changes look fine. I agree with @ebyhr , worth to check flink side. Also, how about V3, it doesn't take into account anything about DV's right? |
There was a problem hiding this comment.
LGTM as well
+1 to double check to @ebyhr's point if possible upgrade flink too (if any changes required)
Also, how about V3, it doesn't take into account anything about DV's right?
my understanding is FileContet.POSITION_DELETE applies for both v2 and v3 deletes
| totalDeleteFileSizeInBytes( | ||
| table.snapshot(posDeleteCommitId).addedDeleteFiles(table.io()), |
There was a problem hiding this comment.
nit: can we compute the iterator once and reuse it in both position / equality delete
There was a problem hiding this comment.
Since other data files will also be iterated over multiple times, I believe a limited amount of redundancy is acceptable.
… PartitionsTable Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
Signed-off-by: xuba <xuba@cisco.com>
…tions table Signed-off-by: xuba <xuba@cisco.com>
b661ec9 to
ec90f6c
Compare
|
Hi @singhpk234 @szehon-ho @ebyhr |
|
sorry for late reply. I didnt realize it also works for V3 DV's, can we add positive tests for V2/V3 tables to make sure it works (I think currently tests just assert 0L) |
Signed-off-by: xuba <xuba@cisco.com>
Hi @szehon-ho . I added a unit test to verify the delete file status in the partitions metadata table after writing to a V3 table with delete vectors. Could you please check whether it looks correct? |
close #14803
This PR adds
total_position_delete_file_size_in_bytesandtotal_equality_delete_file_size_in_bytesto Partitions Table