Skip to content

fix(index): use exact scalar indices for filtered count_rows#6489

Open
myandpr wants to merge 2 commits intolance-format:mainfrom
myandpr:issue-2682-exact-scalar-count
Open

fix(index): use exact scalar indices for filtered count_rows#6489
myandpr wants to merge 2 commits intolance-format:mainfrom
myandpr:issue-2682-exact-scalar-count

Conversation

@myandpr
Copy link
Copy Markdown
Contributor

@myandpr myandpr commented Apr 12, 2026

Fixes #2682.

Summary

  • add an exact scalar-index fast path for Dataset::count_rows(Some(filter))
  • keep the existing scanner path as the fallback when the filter is not an exact index search
  • load and query all physical segments for a logical scalar index name
  • share scalar-index fragment coverage logic between count / scanner / exec paths

Problem

Dataset::count_rows(Some(filter)) always went through the scan-based count path, even when the filter could be satisfied exactly by a scalar index. That meant we still materialized row ids and counted them through the scanner.

There was also a correctness gap once a logical scalar index had multiple physical segments. Coverage checks could treat the logical index as fully covering the dataset, while query execution still loaded only one segment. In that state, an exact result could be treated as complete even though it only came from part of the logical index.

Approach

This change adds a dedicated exact-count path for filtered count_rows and only uses it when the filter planner produces an exact scalar-index search with full fragment coverage and no refine step. Exact allow-list and block-list results are counted directly, with deletion masks handled explicitly.

To keep coverage checks and execution aligned, scalar-index loading now aggregates all segments for the same logical index name into a LogicalScalarIndex and unions their search results.

Tests

  • cargo test -p lance test_fast_count_rows -- --nocapture
  • cargo test -p lance test_logical_scalar_index_search_unions_all_segments -- --nocapture
  • cargo test -p lance test_plans -- --nocapture

@github-actions github-actions bot added the bug Something isn't working label Apr 12, 2026
@myandpr myandpr changed the title fix(index): fast count_rows via exact scalar indices fix(index): use exact scalar indices for filtered count_rows Apr 12, 2026
@codecov
Copy link
Copy Markdown

codecov bot commented Apr 12, 2026

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug Something isn't working

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Don't materialize any data on count_rows if the filter can be satisfied by a scalar index

1 participant