[io] Write byte counts in streamer field#22430
Conversation
The version 0 and version 1 streamer fields are exactly the same for small objects (<=1GiB). For larger objects, the version 1 streamer field stores the byte count stack before the data. This is up to a future patch. For the moment, we assert that the streamed buffer is never >1GiB.
Test Results 22 files 22 suites 3d 10h 28m 44s ⏱️ For more details on these failures, see this check. Results for commit 3b67b13. ♻️ This comment has been updated with latest results. |
Add version 1 streamer field.
Uses RStreamerField as a test bed for (de-)serializing large objects with TBufferFile.
fd54941 to
3b67b13
Compare
| if (static_cast<std::size_t>(nbytes) > kMaxSmallBuffer) { | ||
| throw RException(R__FAIL("large objects (>1GiB) not supported by the version 0 streamer field")); | ||
| } else { | ||
| assert(buffer.GetByteCounts().empty()); |
There was a problem hiding this comment.
For back-/forwards-porting this assert will have to be removed...
| Both versions have an identical on-disk representation when the streamed object is smaller than 1GiB. | ||
| Only version 1 supports larger streamed objects. | ||
| For large objects, the version 1 streamer field prepends the large byte counts ("byte count stack") to the byte stream. | ||
| The format for the version 1 byte stream is |
There was a problem hiding this comment.
For objects bigger than 1 GiB, the format [...] is:
| The format for the version 1 byte stream is | ||
|
|
||
| - 64bit unsigned integer: number of elements in the large byte count list | ||
| - List if 64bit unsigned integer pairs with the byte count location and byte count value` |
There was a problem hiding this comment.
"List of" plus stray backtick at the end of the line
| const std::size_t szBufCounts = sizeof(std::uint64_t) * (2 * nCounts + 1); | ||
| if (szBufCounts > nbytes) | ||
| throw RException(R__FAIL("invalid byte count size in streamer field: " + std::to_string(nCounts))); | ||
| bufCounts.resize(szBufCounts); |
There was a problem hiding this comment.
This will zero-initialize the elements. Maybe use Internal::MakeUninitArray as for writing?
| std::vector<unsigned char> bufCounts(sizeof(std::uint64_t)); | ||
| fAuxiliaryColumn->ReadV(collectionStart, sizeof(std::uint64_t), bufCounts.data()); | ||
| std::uint64_t nCounts; | ||
| std::size_t pos = Internal::RNTupleSerializer::DeserializeUInt64(bufCounts.data(), nCounts); |
There was a problem hiding this comment.
If using Internal::MakeUninitArray below, this could be a static array on the stack
|
|
||
| namespace { | ||
|
|
||
| class FileRaii { |
There was a problem hiding this comment.
We have this in tree/ntuple/test/ntuple_test.hxx, no?
Adds support for streamer field v1, which writes the byte counts of large objects into the stream.
The first two commits should be backported.
Replaces #20608