Skip to content

[common] Optimize getNullCounts() to return int[] instead of Long[]#3054

Open
platinumhamburg wants to merge 1 commit intoapache:mainfrom
platinumhamburg:optimize-null-counts-int-array
Open

[common] Optimize getNullCounts() to return int[] instead of Long[]#3054
platinumhamburg wants to merge 1 commit intoapache:mainfrom
platinumhamburg:optimize-null-counts-int-array

Conversation

@platinumhamburg
Copy link
Copy Markdown
Contributor

Since null counts are stored as 4-byte integers in the batch statistics binary format, int[] is sufficient and avoids boxing overhead (8 bytes per Long vs 4 bytes per int). Use -1 as the sentinel for "not available" instead of null. This reduces memory usage for cachedNullCounts and statsNullCounts, especially for wide tables with many fields.

Closes #3021

Purpose

Linked issue: close #3021

Brief change log

Tests

API and Format

Documentation

Since null counts are stored as 4-byte integers in the batch statistics
binary format, int[] is sufficient and avoids boxing overhead (8 bytes
per Long vs 4 bytes per int). Use -1 as the sentinel for "not available"
instead of null. This reduces memory usage for cachedNullCounts and
statsNullCounts, especially for wide tables with many fields.

Closes apache#3021
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Optimize DefaultLogRecordBatchStatistics.getNullCounts() to return int[] instead of Long[]

1 participant