feat(compaction): support multi-level lookup in LSM tree#179
Merged
lxy-9602 merged 9 commits intoalibaba:mainfrom Mar 16, 2026
Merged
feat(compaction): support multi-level lookup in LSM tree#179lxy-9602 merged 9 commits intoalibaba:mainfrom
lxy-9602 merged 9 commits intoalibaba:mainfrom
Conversation
2926a75 to
28f8fde
Compare
ed56beb to
c1e6cd3
Compare
There was a problem hiding this comment.
Pull request overview
This PR adds a multi-level key lookup facility for MergeTree/LSM levels (including Level-0 handling), introduces supporting data structures (Levels, SortedRun utilities, local lookup-file caching), and updates the file-reading API to make GetPreviousBatchFirstRowNumber() fallible (Result<uint64_t>), with cascading updates across formats/readers and tests.
Changes:
- Introduce MergeTree level modeling + lookup implementation (
Levels,LookupLevels,LookupFile,LookupUtils) with new unit tests. - Update
FileBatchReader::GetPreviousBatchFirstRowNumber()to returnResult<uint64_t>and propagate the change through format readers, wrappers, and tests. - Extend KeyValue data-file reading to expose file positions (
NextWithFilePos) and add small utility improvements (IOManager,EqualsIgnoreNullable,MoveVector,operator!=).
Reviewed changes
Copilot reviewed 72 out of 72 changed files in this pull request and generated 13 comments.
Show a summary per file
| File | Description |
|---|---|
| src/paimon/testing/mock/mock_key_value_data_file_record_reader.h | Adjust mock record reader to accept FileBatchReader. |
| src/paimon/testing/mock/mock_file_batch_reader.h | Update mock to return Result<uint64_t> for previous batch row. |
| src/paimon/format/parquet/parquet_file_batch_reader_test.cpp | Update tests for Result<uint64_t> return type. |
| src/paimon/format/parquet/parquet_file_batch_reader.h | Update override signature to Result<uint64_t>. |
| src/paimon/format/parquet/file_reader_wrapper_test.cpp | Update wrapper tests for Result<uint64_t>. |
| src/paimon/format/parquet/file_reader_wrapper.h | Update wrapper accessor to Result<uint64_t>. |
| src/paimon/format/orc/orc_file_batch_reader_test.cpp | Update ORC tests for Result<uint64_t>. |
| src/paimon/format/orc/orc_file_batch_reader.h | Update override signature to Result<uint64_t>. |
| src/paimon/format/lance/lance_format_reader_writer_test.cpp | Add coverage for previous-batch row tracking + bitmap pushdown invalid case. |
| src/paimon/format/lance/lance_file_batch_reader.h | Implement GetPreviousBatchFirstRowNumber() with bitmap-pushdown restriction. |
| src/paimon/format/lance/lance_file_batch_reader.cpp | Track previous/last batch row numbers and reset on schema change. |
| src/paimon/format/blob/blob_file_batch_reader_test.cpp | Update tests for Result<uint64_t> and adjust bitmap case coverage. |
| src/paimon/format/blob/blob_file_batch_reader.h | Return invalid status when previous-batch row number is undefined under bitmap pushdown. |
| src/paimon/format/avro/avro_file_batch_reader_test.cpp | Update tests for Result<uint64_t>. |
| src/paimon/format/avro/avro_file_batch_reader.h | Update override signature to Result<uint64_t>. |
| src/paimon/core/operation/raw_file_split_read.h | Adjust DV/index application helper to return FileBatchReader. |
| src/paimon/core/operation/raw_file_split_read.cpp | Convert vectors of FileBatchReader to BatchReader via ObjectUtils::MoveVector. |
| src/paimon/core/operation/merge_file_split_read.h | Adjust DV/index application helper to return FileBatchReader. |
| src/paimon/core/operation/merge_file_split_read.cpp | Push down bitmap DV for BitmapDeletionVector where supported; use MoveVector. |
| src/paimon/core/operation/data_evolution_split_read.h | Adjust DV/index application helper to return FileBatchReader. |
| src/paimon/core/operation/data_evolution_split_read.cpp | Use MoveVector to bridge FileBatchReader vectors into ConcatBatchReader. |
| src/paimon/core/operation/abstract_split_read.h | Make raw file reader creation return FileBatchReader and expose it for lookup use. |
| src/paimon/core/operation/abstract_split_read.cpp | Propagate FileBatchReader through field-mapping reader creation. |
| src/paimon/core/mergetree/sorted_run_test.cpp | Add tests for sorting/validating runs and equality. |
| src/paimon/core/mergetree/sorted_run.h | Add Empty/FromUnsorted helpers and equality operator. |
| src/paimon/core/mergetree/lookup_utils.h | New: shared lookup helpers for Level-0 and sorted levels. |
| src/paimon/core/mergetree/lookup_levels_test.cpp | New: multi-level lookup tests (Level-0 and multi-file scenarios). |
| src/paimon/core/mergetree/lookup_levels.h | New: lookup-by-key facade across Levels with caching and processor management. |
| src/paimon/core/mergetree/lookup_levels.cpp | New: implementation that builds local lookup files from data files and looks up keys. |
| src/paimon/core/mergetree/lookup_file_test.cpp | New: tests for LookupFile behavior and local file prefix generation. |
| src/paimon/core/mergetree/lookup_file.h | New: local lookup-file wrapper with auto-close and delete-on-close. |
| src/paimon/core/mergetree/lookup/default_lookup_serializer_factory.h | Use ArrowUtils::EqualsIgnoreNullable and improve schema mismatch error message. |
| src/paimon/core/mergetree/levels_test.cpp | New: tests for Levels construction, update, and ordering. |
| src/paimon/core/mergetree/levels.h | New: Levels container for Level-0 set + per-level sorted runs. |
| src/paimon/core/mergetree/levels.cpp | New: Levels implementation (create/update/query). |
| src/paimon/core/mergetree/level_sorted_run.h | Add equality operator for test comparisons. |
| src/paimon/core/mergetree/compact/merge_tree_compact_rewriter_test.cpp | Adjust rewriter creation to new signature (no executor arg). |
| src/paimon/core/mergetree/compact/merge_tree_compact_rewriter.h | Remove executor from factory signature. |
| src/paimon/core/mergetree/compact/merge_tree_compact_rewriter.cpp | Use default executor internally and simplify ReadContext builder. |
| src/paimon/core/key_value.h | Add default constructor for KeyValue. |
| src/paimon/core/io/key_value_data_file_record_reader_test.cpp | Add tests for selected-bitmap + file-position iteration. |
| src/paimon/core/io/key_value_data_file_record_reader.h | Switch to FileBatchReader, add iterator file-position support. |
| src/paimon/core/io/key_value_data_file_record_reader.cpp | Implement NextWithFilePos() using previous-batch row offset. |
| src/paimon/core/io/field_mapping_reader.h | Make FieldMappingReader a FileBatchReader and forward required APIs. |
| src/paimon/core/io/field_mapping_reader.cpp | Adjust ctor signature to accept FileBatchReader. |
| src/paimon/core/io/data_file_meta_test.cpp | Add equality/inequality assertions. |
| src/paimon/core/io/data_file_meta.h | Add operator!=. |
| src/paimon/core/io/data_file_meta.cpp | Implement operator!=. |
| src/paimon/core/io/complete_row_tracking_fields_reader.h | Update override signature to Result<uint64_t>. |
| src/paimon/core/io/complete_row_tracking_fields_reader.cpp | Handle Result<uint64_t> for previous batch row. |
| src/paimon/core/disk/io_manager.cpp | New: IOManager implementation for temp file path generation. |
| src/paimon/core/deletionvectors/apply_deletion_vector_batch_reader.h | Make wrapper a FileBatchReader and forward required APIs. |
| src/paimon/common/utils/object_utils_test.cpp | Add tests for MoveVector. |
| src/paimon/common/utils/object_utils.h | Add MoveVector helper for moving vectors of convertible pointer types. |
| src/paimon/common/utils/binary_row_partition_computer_test.cpp | Add tests for PartToSimpleString. |
| src/paimon/common/utils/binary_row_partition_computer.h | Add PartToSimpleString API. |
| src/paimon/common/utils/binary_row_partition_computer.cpp | Implement PartToSimpleString. |
| src/paimon/common/utils/arrow/arrow_utils_test.cpp | Add tests for EqualsIgnoreNullable. |
| src/paimon/common/utils/arrow/arrow_utils.h | Add EqualsIgnoreNullable declaration. |
| src/paimon/common/utils/arrow/arrow_utils.cpp | Implement EqualsIgnoreNullable. |
| src/paimon/common/sst/sst_file_reader.cpp | Silence unused variable with [[maybe_unused]]. |
| src/paimon/common/sst/block_reader.h | Make index shared_ptr param const-ref. |
| src/paimon/common/reader/prefetch_file_batch_reader_impl_test.cpp | Update tests for Result<uint64_t>. |
| src/paimon/common/reader/prefetch_file_batch_reader_impl.h | Update signature to Result<uint64_t>. |
| src/paimon/common/reader/prefetch_file_batch_reader_impl.cpp | Propagate Result<uint64_t> in internal handling. |
| src/paimon/common/reader/delegating_prefetch_reader.h | Update override signature to Result<uint64_t>. |
| src/paimon/common/file_index/bitmap/apply_bitmap_index_batch_reader.h | Convert wrapper to FileBatchReader and propagate previous-row result. |
| src/paimon/common/data/timestamp_test.cpp | Use ASSERT_NE for inequality. |
| src/paimon/CMakeLists.txt | Register new core sources and tests. |
| include/paimon/reader/file_batch_reader.h | API change: GetPreviousBatchFirstRowNumber() returns Result<uint64_t>. |
| include/paimon/disk/io_manager.h | New public IOManager interface. |
| include/paimon/data/timestamp.h | Add operator!=. |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
ae0b687 to
073bcc1
Compare
lucasfang
reviewed
Mar 16, 2026
lucasfang
reviewed
Mar 16, 2026
478e82d to
b9800ae
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Purpose
This PR introduces a comprehensive lookup mechanism for the MergeTree data structure, enabling efficient key-based lookups across multiple levels of data files. The implementation provides various value processing strategies.
Linked issue: #93
Tests
core/mergetree/levels_test.cpp
core/mergetree/lookup_file_test.cpp
core/mergetree/lookup_levels_test.cpp
API and Format
In class FileBatchReader:
virtual uint64_t GetPreviousBatchFirstRowNumber() const = 0;
=> virtual Result<uint64_t> GetPreviousBatchFirstRowNumber() const = 0;