feat(compaction): support append table compaction with dv#177
feat(compaction): support append table compaction with dv#177lucasfang wants to merge 22 commits intoalibaba:mainfrom
Conversation
There was a problem hiding this comment.
Pull request overview
Adds deletion-vector (DV) support needed for append-table compaction by threading DV index restoration/reading through write/scan/read paths and extending index handling.
Changes:
- Extend write-restore and writer creation to restore DV index metadata and initialize per-bucket DV maintainers.
- Refactor split readers to use a
DeletionVector::Factorycallback instead of passing deletion-file maps. - Enhance index handling and options to support scanning DV index files and configuring DV index target file size / bitmap64 mode.
Reviewed changes
Copilot reviewed 35 out of 35 changed files in this pull request and generated 9 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/core/table/source/table_scan.cpp | Update IndexFileHandler construction to pass filesystem and DV-related options. |
| src/paimon/core/postpone/postpone_bucket_file_store_write.h | Plumb DV maintainer factory into writer creation signature changes. |
| src/paimon/core/operation/write_restore.h | Expand restore API to be partition/bucket scoped and optionally scan DV index. |
| src/paimon/core/operation/raw_file_split_read.h | Add DV factory-based reader creation and update DV/index application signature. |
| src/paimon/core/operation/raw_file_split_read.cpp | Implement DV factory overload and switch DV reading to factory callback. |
| src/paimon/core/operation/merge_file_split_read.h | Replace deletion-file map usage with DV factory across merge read APIs. |
| src/paimon/core/operation/merge_file_split_read.cpp | Build DV factory from index metadata and use it during merge/no-merge reads. |
| src/paimon/core/operation/key_value_file_store_write.h | Update writer creation signature to accept restore info + DV maintainer. |
| src/paimon/core/operation/key_value_file_store_write.cpp | Use restored max sequence number and new writer signature. |
| src/paimon/core/operation/file_system_write_restore.h | Add IndexFileHandler support and scan DV index files during restore. |
| src/paimon/core/operation/file_store_write.cpp | Create DV maintainer factory for append tables and pass into append writer. |
| src/paimon/core/operation/data_evolution_split_read.h | Update signature to accept DV factory and reject DVs via factory presence. |
| src/paimon/core/operation/data_evolution_split_read.cpp | Reject DV usage based on DV factory rather than deletion map emptiness. |
| src/paimon/core/operation/append_only_file_store_write.h | Wire DV maintainer factory into append-only write + compaction read path. |
| src/paimon/core/operation/append_only_file_store_write.cpp | Use DV maintainer during compaction; pass DV factory into raw split reader. |
| src/paimon/core/operation/abstract_split_read.h | Switch split-read plumbing from deletion-file maps to DV factory callback. |
| src/paimon/core/operation/abstract_split_read.cpp | Thread DV factory into field-mapping readers and DV/index application. |
| src/paimon/core/operation/abstract_file_store_write.h | Add DV maintainer factory to writer base; refactor writer creation API. |
| src/paimon/core/operation/abstract_file_store_write.cpp | Restore DV index metadata, create DV maintainer per bucket, refactor caching. |
| src/paimon/core/index/index_file_handler_test.cpp | Update test construction for new IndexFileHandler constructor. |
| src/paimon/core/index/index_file_handler.h | Add DV index helpers and partition+bucket scan API to IndexFileHandler. |
| src/paimon/core/index/index_file_handler.cpp | Implement partition+bucket scan convenience method. |
| src/paimon/core/global_index/global_index_scan_impl.cpp | Update IndexFileHandler construction to pass DV options. |
| src/paimon/core/deletionvectors/deletion_vectors_index_file.h | Add DV index reading APIs and a version-check helper. |
| src/paimon/core/deletionvectors/deletion_vectors_index_file.cpp | Implement reading all deletion vectors from DV index files. |
| src/paimon/core/deletionvectors/deletion_vector.h | Add DV factory typedef and stream-based DV read API. |
| src/paimon/core/deletionvectors/deletion_vector.cpp | Implement stream-based DV read; add bitmap64 header dependency. |
| src/paimon/core/deletionvectors/bucketed_dv_maintainer.h | Add factory for restoring a BucketedDvMaintainer from index metadata. |
| src/paimon/core/deletionvectors/bitmap64_deletion_vector.h | Introduce placeholder type/magic number for bitmap64 DV. |
| src/paimon/core/deletionvectors/apply_deletion_vector_batch_reader.h | Change DV ownership to shared_ptr for reader application. |
| src/paimon/core/core_options_test.cpp | Add coverage for new DV options defaults and parsing. |
| src/paimon/core/core_options.h | Add getters for DV bitmap64 and DV index target file size. |
| src/paimon/core/core_options.cpp | Parse/store new DV options and expose via getters. |
| src/paimon/common/defs.cpp | Register new DV option keys. |
| include/paimon/defs.h | Document new DV option keys in the public options API. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
src/paimon/core/deletionvectors/deletion_vectors_index_file.cpp
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 50 out of 50 changed files in this pull request and generated 9 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
cf29c33 to
68ddeed
Compare
a17e2ef to
fbed520
Compare
There was a problem hiding this comment.
Pull request overview
Copilot reviewed 68 out of 68 changed files in this pull request and generated 12 comments.
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
Purpose
Support append table compaction with dv, and support real cancellation for append compaction task by adding a boolean flag for CompactionTask.
Tests
BucketedDvMaintainerTest
BucketedAppendCompactManagerTest
CompactDeletionFileTest
DeletionVectorIndexFileWriterTest
DeletionVectorsIndexFileTest
IndexFileHandlerTest
IndexManifestFileHandlerTest
FileSystemWriteRestoreTest
CompactionInteTest
API and Format
Documentation