[ntuple] Support reading alternative types through the RNTupleProcessor#21462
[ntuple] Support reading alternative types through the RNTupleProcessor#21462enirolf wants to merge 7 commits intoroot-project:masterfrom
RNTupleProcessor#21462Conversation
This changes requires moving the ownership of all fields from an internally stored `RNTupleModel` in each processor (`fProtoModel`) to the `RNTupleProcessorEntry`. Full removal of `fProtoModel` is taken car of in the next commit. This change also enables proper handling of subfields, for wich a test will be added in a separate commit.
RNTupleProcessorRNTupleProcessor
Test Results 22 files 22 suites 3d 8h 27m 12s ⏱️ Results for commit 091710b. ♻️ This comment has been updated with latest results. |
To distinguish them internally from regular fields that have been added explicitly by the user.
This change also includes a different way to handle field/processor naming conflicts, which previously relied on the `fProtoModel`, plus an additional test for this
To make sure that fields of "second chains" are also added correctly to the processor.
ec4f94c to
091710b
Compare
| virtual ROOT::RResult<Internal::RNTupleProcessorEntry::FieldIndex_t> | ||
| AddFieldToEntry(std::string_view fieldName, void *valuePtr, | ||
| virtual Internal::RNTupleProcessorEntry::FieldIndex_t | ||
| AddFieldToEntry(const std::string &fieldName, const std::string &typeName, void *valuePtr, |
There was a problem hiding this comment.
Could you please split all the std::string_view -> const std::string& changes into a separate commit?
| if (const auto &processorPrefix = provenance.Get(); !processorPrefix.empty()) | ||
| fieldNameWithProcessorPrefix = processorPrefix + "." + qualifiedFieldName; | ||
|
|
||
| if (FindFieldIndex(fieldNameWithProcessorPrefix)) |
There was a problem hiding this comment.
I think you can use fFieldName2Index.try_emplace() to avoid the double lookup (here and at line 209)
| /// | ||
| /// \param[in] fieldIdx The index of the field in the entry. | ||
| std::string GetFieldName(FieldIndex_t fieldIdx) const | ||
| std::string GetQualifiedFieldName(FieldIndex_t fieldIdx) const |
There was a problem hiding this comment.
At this point this can probably return a const std::string &
There was a problem hiding this comment.
Not sure, by adding more field / values later on, fProcessorValues may be forced to reallocate...
| std::vector<RProcessorValue> fProcessorValues; | ||
| std::unordered_map<std::string, FieldIndex_t> fFieldName2Index; | ||
| // Maps from the field name to all type alternatives for that field that have been added to the entry. | ||
| std::unordered_map<std::string, std::unordered_set<FieldIndex_t>> fFieldName2Index; |
There was a problem hiding this comment.
I'm not sure about using an unordered_set here: unless we expect hundreds of alternative types for a single field, a vector is probably gonna be much faster (especially if the most common case is a single element in it)
| if (posDot != std::string::npos) | ||
| onDiskFieldName = onDiskFieldName.substr(posDot + 1); | ||
|
|
||
| // Stip the "_join" prefix (for join fields) from the field name, if present. |
| fieldInfo.fValue = std::move(newValue); | ||
| fieldInfo.fIsValid = true; | ||
| } else { | ||
| fieldInfo.fIsValid = false; |
There was a problem hiding this comment.
Should this also reset fField and fValue?
| /// | ||
| /// \param[in] fieldIdx The index of the field in the entry. | ||
| std::string GetFieldName(FieldIndex_t fieldIdx) const | ||
| std::string GetQualifiedFieldName(FieldIndex_t fieldIdx) const |
There was a problem hiding this comment.
Not sure, by adding more field / values later on, fProcessorValues may be forced to reallocate...
|
|
||
| const auto &desc = fPageSource->GetSharedDescriptorGuard().GetRef(); | ||
| ROOT::RFieldZero fieldZero; | ||
| fieldZero.SetOnDiskId(desc.GetFieldZeroId()); |
There was a problem hiding this comment.
Looking at other call sites, I think I remember that the purpose of the temporary RFieldZero is to allow field substitution. If this is wanted, I think the function is missing a call to Internal::SetAllowFieldSubstitutions(fieldZero, true). And the on-disk id seems not needed for the fieldZero...
This changes requires moving the ownership of all fields from an internally stored
RNTupleModelin each processor (fProtoModel) to theRNTupleProcessorEntry. In turn, potential namin conflicts when auxiliary RNTuples in a join have the same name as a field in the primary RNTuple are now also handled differently, since this used to rely on thefProtoModel.This change implicitly adds proper support for requesting subfields, for which tests have been added.