pyjsroot -- a jsroot server working with pyh101.
-
See intro.md for an estimate of how much boilerplate is added which each new detector.
-
UCESB is very good at what it does, but its raw output on a mapped level is hard to interpret, especially for
ZERO_SUPPRESS_MULTI. -
Keeping R3BRoot/r3bsource synced with UCESB is a chore. There is a lot of boilerplate to map all arrays.
-
Using r3bsource (e.g. from the "macro") seems like a particularly mean-spirited joke -- I have to locally declare a struct which contains the memory of all the
ext_h101structures, and then pass these to the various Readers, e.g.
struct EXT_STR_h101_t
{
...
EXT_STR_h101_LOS_t los;
...
};
// in main()
auto los_reader = new R3BLosReader(&ucesb_struct.los, offsetof(EXT_STR_h101_t, los));
Note how R3BRoot has a class to read LOS data from UCESB, precisely so that the user does not have to deal with the struct EXT_STR_h101_LOS_t, but nevertheless requires the user to manually decide on a memory layout for their overall ucesb struct.
-
UCESB converts the electronics channels to physics channels, which is great unless you require the TDC trigger time of the electronics module for that channel, in which case you need to break the abstraction/mapping and figure out which trigger time is actually the one you want to subtract. This is typically solved by transporting mapping information from the unpacker to R3BRoot, using an autogenerated header defining a mapping array if you are lucky or a FairParam if you are not.
-
Once the data is converted to R3BRoot native per-detector-class TClonesArrays by per-detector classes, it is further processed, which likely involves per-detector calibration parameter container classes (not for tcal, but for example for trigger mapping). My opinions about all of this are well known. Let us instead talk about the final step, which is histogramming.
Most histogramming code I see in nuclear physics feels deeply inaestetic to me.
Consider R3BCalifaOnlineSpectra (.h, cxx) as a typical example. That class fills 31 different histogram entities of various dimensionality. Single histograms are declared as pointers, one-dimensional arrays of histograms are declared as std::vector of pointers, multi-dimensional arrays of histograms are declared as C-arrays of pointers.
In the simplest case (single histograms), a declaration reads like this:
TH2F* fh2_Califa_coinPhi;
In the corresponding cxx file, we run over this histogram in different places.
620 lines deep in the Init() method, it is initialized:
fh2_Califa_coinPhi =
R3B::root_owned<TH2F>("fh2_Califa_phi_correlations", "Califa phi correlations", 600, -190, 190, 600, -190, 190);
A few more lines are then spent to create a TCanvas, set axis labels and draw it using "COLZ".
The next place we find it is an exceptionally short method, Reset(), which clears all histograms. Our histogram is mentioned 94 lines into that method:
fh2_Califa_coinPhi->Reset();
The final place it is used is where it is actually filled, 302 lines within the Exec method.
fh2_Califa_coinPhi->Fill(califa_phi[i1], califa_phi[i2]);
The lack of a destructor is by design, and the fact that the pointer is uninitialized between object construction and Init() could be fixed fairly easily by using default member initalizers. Also, iterating through all the histograms for Reset() could be avoided fairly easily by adding any allocated histogram to a vector. Still, that leaves three different places where the histogram pointer appears, which is about two too many.
Early attempts by me to generate a single definition of a histogram included rewriting more than a decade ago the Califa GO4 using a struct HistogramAxis which contained information on range, binning, axis labeling and a function pointer to extract the value to be histogrammed. You would pass one or more of these axis objects to the constructor of your histogramming class, and the class would histogram just the quantities you were interested in.
A few years later, I implemented something similar for R3BRoot collections. Only this time, it was based on template metaprogramming, which meant that this could have been turned into something optimizable to zero runtime overhead. Alas, template error messages are generally not very user friendly. Still, probably the most fun I ever had with a C++ compiler.
This time, I am aiming for something a bit more practical.
I based this on pyh101, a recent framework for handling ucesb unpacker data from Python. I am of the firm opinion that C++ is not a suitable language for data analysis within R3B. While there are certainly people in the R3B collaboration whose C++ I would trust as much as anyones (which is not as faint a praise as you might think it is), I am of the firm opinion that most the R3BRoot contributors do not have an adequate knowledge of C++, and the review process in basically non-existent. To be fair, "adequate knowledge of C++" is a big ask -- it covers everything from "don't put a leading zero in front of your integer constants unless you like octal numbers", constness, templates, how to get sane multi-dimensional arrays, exception handling, and a ton of similarly esoteric topics. Take object ownership -- I have seen a ton of code to the effect of:
R3bFoo::Init()
{
fBarCA=dynamic_cast<...>(FairRootManager::Instance().GetObject("Bar");
}
R3BFoo::~R3BFoo()
{
if (fBarCA)
delete fBarCA;
}
Whoever writes such code clearly has an insufficient understanding of object ownership. FairRootManager::GetObject() will not return a unique pointer per call just to do as you please with, it will return a pointer to an object which is also used in other places, and you have no business deleting it. Cargo-cult programming on the level of "I have seen others delete their pointer members in the destructor, hence I should call delete on all of my member pointers" does not cut it. The unneccessary if is just adding insult to injury.
So as I was starting from the scratch in any case, I recognized that getting contributors to the point where they can contribute safe Python is much easier than getting them to the point where they can contribute safe C++. Less stuff to unlearn, for one thing.
As far as alternatives to C++ go, Python has some clear advantages:
-
It is very general purpose. From running I2C on a RasPi over running numerical algorithms from numpy/scipy to neural networking, it can be used for all kinds of purposes.
-
It is a straightforward imperative language, which is what people are most likely to have experience with.
-
It offers a well-documented interface for implementing modules in C.
-
ROOT offers Python bindings. While the merits of ROOT are debatable (it improved a lot with ROOT6, but a lot of outdated weird design choices are grandfathered in), I did not want change everything at once. I have long felt that PyROOT is the preferable way to interact with ROOT. ROOT's devil-may-care attitude towards type safety ("anything is a TObject") plays much better with Python's duck typing than with C++'s strict typing requirements. All the
dynamic_castoperators are just not very straightforward for most users.
This would be a good argument if we spent a PhD salary on compute. From what I have seen, the R3BRoot analysis code is not heavily optimized. I have fixed code where string processing was used to look up Califa crystals in an event loop. I have seen code performed an operation in O(n**2) which was clearly O(n*log(n)) in complexity -- not because the author had argued that n would be small, but because they simply did not know the relevant STL data types.
One of the very few design decisions of FairRoot I agree with is not insisting on static type safety for data passed between FairTasks. Doing otherwise would basically force people to use a global struct which contains all of their datatypes as fields and refer to these members by name.
I will grant that type errors in Python can at times by annoying. However, there is nothing stopping anyone from using type annotations.
I am using ROOT for plotting and histogramming. Granted, this is one of the areas where ROOT is especially weird (a TH2I is a TH1, no separation of data and visual representation, the coordinate axes are a property of a histogram to the point that TGraph will create a histogram just to plot them, while which axes are logscale is a property of the pad, etc), but it is a weirdness most of us have used for half our lives.
Because of this rather limited scope, nothing would someone to replace online_base.py with a backend which uses matplotlib and jupyter or whatever.
There are clear advantages to running as a web server. JSROOT mostly does what I need it to do. Again, I do not have a deep ideological conviction here, if you write interface code for another backend I might switch.
| Comparison | FR/R3BRoot | pyh101/pyjsroot |
|---|---|---|
| Scripting language | ROOT Cling | Python |
| Compiled language | C++ | C/C++ |
| Adding compiled components | Somewhat painful (RootClassDef, LinkDef.h, LD_LIBRARY_PATH) |
Painful (CPython modules) |
| Adding interpreted code | Somewhat painful (C++) | Easy (Python) |
| Event serialization | TTree of TClonesArrays of custom TObjects | Not yet (pickle?) |
| Parameter serialization | FairParams, based on custom TObjects | json or yaml probably |
| Support for monte carlo simulation | TVirtualMC (with the backend being Geant4 99% of the time) | Nope! ("do one thing, do it well") Use Geant4, write json |
| New classes required whenever | ... you add a new detector | ... you want new functionality (e.g. a new calibration procedure) |
| Support for detectors | All of them! | Whatever you add |
| LOC | 114k (FairRoot) + 280k (R3BRoot) + 10k (R3BParams) | 4600 (ext_data_client) 1300 (pyh101) + 1100 (jsonline) |
| Most common pitfall | forgot a change in a copy+rename orgy | lambda.md |