Skip to content

[io] Write byte counts in streamer field#22430

Open
jblomer wants to merge 6 commits into
root-project:Arlesiennefrom
jblomer:streamer-field-bytecount-v2
Open

[io] Write byte counts in streamer field#22430
jblomer wants to merge 6 commits into
root-project:Arlesiennefrom
jblomer:streamer-field-bytecount-v2

Conversation

@jblomer
Copy link
Copy Markdown
Contributor

@jblomer jblomer commented May 29, 2026

Adds support for streamer field v1, which writes the byte counts of large objects into the stream.

The first two commits should be backported.

Replaces #20608

jblomer added 2 commits May 29, 2026 13:08
The version 0 and version 1 streamer fields are exactly the same for
small objects (<=1GiB). For larger objects, the version 1 streamer field
stores the byte count stack before the data. This is up to a future
patch. For the moment, we assert that the streamed buffer is never
>1GiB.
@jblomer jblomer self-assigned this May 29, 2026
@jblomer jblomer requested review from bellenot and pcanal as code owners May 29, 2026 14:59
@jblomer jblomer requested review from enirolf, hahnjo and silverweed May 29, 2026 14:59
@jblomer jblomer changed the title Streamer field bytecount v2 [io] Write byte counts in streamer field May 29, 2026
@github-actions
Copy link
Copy Markdown

github-actions Bot commented May 29, 2026

Test Results

    22 files      22 suites   3d 10h 28m 44s ⏱️
 3 855 tests  3 854 ✅ 0 💤 1 ❌
77 031 runs  77 030 ✅ 0 💤 1 ❌

For more details on these failures, see this check.

Results for commit 3b67b13.

♻️ This comment has been updated with latest results.

@jblomer jblomer force-pushed the streamer-field-bytecount-v2 branch from fd54941 to 3b67b13 Compare May 31, 2026 19:19
if (static_cast<std::size_t>(nbytes) > kMaxSmallBuffer) {
throw RException(R__FAIL("large objects (>1GiB) not supported by the version 0 streamer field"));
} else {
assert(buffer.GetByteCounts().empty());
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For back-/forwards-porting this assert will have to be removed...

Both versions have an identical on-disk representation when the streamed object is smaller than 1GiB.
Only version 1 supports larger streamed objects.
For large objects, the version 1 streamer field prepends the large byte counts ("byte count stack") to the byte stream.
The format for the version 1 byte stream is
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For objects bigger than 1 GiB, the format [...] is:

The format for the version 1 byte stream is

- 64bit unsigned integer: number of elements in the large byte count list
- List if 64bit unsigned integer pairs with the byte count location and byte count value`
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"List of" plus stray backtick at the end of the line

const std::size_t szBufCounts = sizeof(std::uint64_t) * (2 * nCounts + 1);
if (szBufCounts > nbytes)
throw RException(R__FAIL("invalid byte count size in streamer field: " + std::to_string(nCounts)));
bufCounts.resize(szBufCounts);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This will zero-initialize the elements. Maybe use Internal::MakeUninitArray as for writing?

Comment on lines +1353 to +1356
std::vector<unsigned char> bufCounts(sizeof(std::uint64_t));
fAuxiliaryColumn->ReadV(collectionStart, sizeof(std::uint64_t), bufCounts.data());
std::uint64_t nCounts;
std::size_t pos = Internal::RNTupleSerializer::DeserializeUInt64(bufCounts.data(), nCounts);
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If using Internal::MakeUninitArray below, this could be a static array on the stack


namespace {

class FileRaii {
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have this in tree/ntuple/test/ntuple_test.hxx, no?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants