Skip to content

fix: prevent arithmetic overflow in U64Segment encoding selection for sparse/extreme row id ranges#6516

Open
ivscheianu wants to merge 3 commits intolance-format:mainfrom
ivscheianu:segment-overflow-u128-encoding-selection
Open

fix: prevent arithmetic overflow in U64Segment encoding selection for sparse/extreme row id ranges#6516
ivscheianu wants to merge 3 commits intolance-format:mainfrom
ivscheianu:segment-overflow-u128-encoding-selection

Conversation

@ivscheianu
Copy link
Copy Markdown
Contributor

@ivscheianu ivscheianu commented Apr 14, 2026

U64Segment::from_stats_and_sequence crashes when row IDs span a large range or include values near u64::MAX. Fixes #6515

There are two independent overflow classes:

  1. Cost estimation: n_holes() and sorted_sequence_sizes() compute range spans in u64/usize that wrap for large ranges, making infeasible encodings (RangeWithHoles, RangeWithBitmap) appear cheapest. The code then attempts to materialize billions of holes or allocate multi-exabyte bitmaps.

  2. Exclusive-end: All range-backed encodings construct Range<u64> with stats.max + 1 as the exclusive end. When max == u64::MAX, this overflows even for small, memory-feasible sets (e.g., [u64::MAX - 3, u64::MAX - 1, u64::MAX]).

Both classes cause process aborts in debug and OOM in release. Across JNI this kills the JVM with no recoverable exception.

Fix

n_holes()u128 return type: The total slot count max - min + 1 can be up to 2^64, which exceeds u64::MAX. Widening to u128 gives the correct value instead of wrapping.

sorted_sequence_sizes()u128 arithmetic: All cost estimates computed in u128 with saturating arithmetic, then converted via usize::try_from(...).unwrap_or(usize::MAX). Infeasible encodings saturate and always lose the min() comparison.

from_stats_and_sequence()checked_add(1) gate: exclusive_end = stats.max.checked_add(1) computed once and used as a gate for all range-backed branches. When None (i.e., max == u64::MAX), falls through to SortedArray. The bare expression stats.max + 1 no longer appears in the function.

@github-actions github-actions bot added the bug Something isn't working label Apr 14, 2026
@ivscheianu ivscheianu marked this pull request as ready for review April 14, 2026 15:26
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 14, 2026

Codecov Report

❌ Patch coverage is 91.07143% with 10 lines in your changes missing coverage. Please review.

Files with missing lines Patch % Lines
rust/lance-table/src/rowids/segment.rs 91.07% 5 Missing and 5 partials ⚠️

📢 Thoughts on this report? Let us know!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

U64Segment::sorted_sequence_sizes overflows for sparse row ID ranges

1 participant