Skip to content

[ntuple] Support reading alternative types through the RNTupleProcessor#21462

Open
enirolf wants to merge 7 commits intoroot-project:masterfrom
enirolf:ntuple-processor-alternative-types
Open

[ntuple] Support reading alternative types through the RNTupleProcessor#21462
enirolf wants to merge 7 commits intoroot-project:masterfrom
enirolf:ntuple-processor-alternative-types

Conversation

@enirolf
Copy link
Contributor

@enirolf enirolf commented Mar 3, 2026

This changes requires moving the ownership of all fields from an internally stored RNTupleModel in each processor (fProtoModel) to the RNTupleProcessorEntry. In turn, potential namin conflicts when auxiliary RNTuples in a join have the same name as a field in the primary RNTuple are now also handled differently, since this used to rely on the fProtoModel.

This change implicitly adds proper support for requesting subfields, for which tests have been added.

This changes requires moving the ownership of all fields from an
internally stored `RNTupleModel` in each processor (`fProtoModel`) to
the `RNTupleProcessorEntry`. Full removal of `fProtoModel` is taken
car of in the next commit. This change also enables proper handling of
subfields, for wich a test will be added in a separate commit.
@enirolf enirolf self-assigned this Mar 3, 2026
@enirolf enirolf requested a review from jblomer as a code owner March 3, 2026 10:48
@enirolf enirolf changed the title [ntuple] Support alternative types in RNTupleProcessor [ntuple] Support reading alternative types through the RNTupleProcessor Mar 3, 2026
@github-actions
Copy link

github-actions bot commented Mar 3, 2026

Test Results

    22 files      22 suites   3d 8h 27m 12s ⏱️
 3 808 tests  3 807 ✅ 1 💤 0 ❌
76 632 runs  76 623 ✅ 9 💤 0 ❌

Results for commit 091710b.

♻️ This comment has been updated with latest results.

@enirolf enirolf marked this pull request as draft March 4, 2026 07:42
enirolf added 6 commits March 4, 2026 11:08
To distinguish them internally from regular fields that have been added
explicitly by the user.
This change also includes a different way to handle field/processor
naming conflicts, which previously relied on the `fProtoModel`, plus an
additional test for this
To make sure that fields of "second chains" are also added correctly to
the processor.
@enirolf enirolf force-pushed the ntuple-processor-alternative-types branch from ec4f94c to 091710b Compare March 4, 2026 10:12
@enirolf enirolf marked this pull request as ready for review March 4, 2026 12:10
virtual ROOT::RResult<Internal::RNTupleProcessorEntry::FieldIndex_t>
AddFieldToEntry(std::string_view fieldName, void *valuePtr,
virtual Internal::RNTupleProcessorEntry::FieldIndex_t
AddFieldToEntry(const std::string &fieldName, const std::string &typeName, void *valuePtr,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Could you please split all the std::string_view -> const std::string& changes into a separate commit?

if (const auto &processorPrefix = provenance.Get(); !processorPrefix.empty())
fieldNameWithProcessorPrefix = processorPrefix + "." + qualifiedFieldName;

if (FindFieldIndex(fieldNameWithProcessorPrefix))
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think you can use fFieldName2Index.try_emplace() to avoid the double lookup (here and at line 209)

///
/// \param[in] fieldIdx The index of the field in the entry.
std::string GetFieldName(FieldIndex_t fieldIdx) const
std::string GetQualifiedFieldName(FieldIndex_t fieldIdx) const
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point this can probably return a const std::string &

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, by adding more field / values later on, fProcessorValues may be forced to reallocate...

std::vector<RProcessorValue> fProcessorValues;
std::unordered_map<std::string, FieldIndex_t> fFieldName2Index;
// Maps from the field name to all type alternatives for that field that have been added to the entry.
std::unordered_map<std::string, std::unordered_set<FieldIndex_t>> fFieldName2Index;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure about using an unordered_set here: unless we expect hundreds of alternative types for a single field, a vector is probably gonna be much faster (especially if the most common case is a single element in it)

if (posDot != std::string::npos)
onDiskFieldName = onDiskFieldName.substr(posDot + 1);

// Stip the "_join" prefix (for join fields) from the field name, if present.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Strip

fieldInfo.fValue = std::move(newValue);
fieldInfo.fIsValid = true;
} else {
fieldInfo.fIsValid = false;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should this also reset fField and fValue?

///
/// \param[in] fieldIdx The index of the field in the entry.
std::string GetFieldName(FieldIndex_t fieldIdx) const
std::string GetQualifiedFieldName(FieldIndex_t fieldIdx) const
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure, by adding more field / values later on, fProcessorValues may be forced to reallocate...


const auto &desc = fPageSource->GetSharedDescriptorGuard().GetRef();
ROOT::RFieldZero fieldZero;
fieldZero.SetOnDiskId(desc.GetFieldZeroId());
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Looking at other call sites, I think I remember that the purpose of the temporary RFieldZero is to allow field substitution. If this is wanted, I think the function is missing a call to Internal::SetAllowFieldSubstitutions(fieldZero, true). And the on-disk id seems not needed for the fieldZero...

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants