pyjsroot -- a jsroot server working with pyh101.

Motivation

Present state of R3B analysis software

See intro.md for an estimate of how much boilerplate is added which each new detector.
UCESB is very good at what it does, but its raw output on a mapped level is hard to interpret, especially for ZERO_SUPPRESS_MULTI.
Keeping R3BRoot/r3bsource synced with UCESB is a chore. There is a lot of boilerplate to map all arrays.
Using r3bsource (e.g. from the "macro") seems like a particularly mean-spirited joke -- I have to locally declare a struct which contains the memory of all the ext_h101 structures, and then pass these to the various Readers, e.g.

  struct EXT_STR_h101_t
  {
      ...
      EXT_STR_h101_LOS_t los;
      ...
  };

  // in main()
  auto los_reader = new R3BLosReader(&ucesb_struct.los, offsetof(EXT_STR_h101_t, los));

Note how R3BRoot has a class to read LOS data from UCESB, precisely so that the user does not have to deal with the struct EXT_STR_h101_LOS_t, but nevertheless requires the user to manually decide on a memory layout for their overall ucesb struct.

UCESB converts the electronics channels to physics channels, which is great unless you require the TDC trigger time of the electronics module for that channel, in which case you need to break the abstraction/mapping and figure out which trigger time is actually the one you want to subtract. This is typically solved by transporting mapping information from the unpacker to R3BRoot, using an autogenerated header defining a mapping array if you are lucky or a FairParam if you are not.
Once the data is converted to R3BRoot native per-detector-class TClonesArrays by per-detector classes, it is further processed, which likely involves per-detector calibration parameter container classes (not for tcal, but for example for trigger mapping). My opinions about all of this are well known. Let us instead talk about the final step, which is histogramming.

Histogramming

Most histogramming code I see in nuclear physics feels deeply inaestetic to me.

Consider R3BCalifaOnlineSpectra (.h, cxx) as a typical example. That class fills 31 different histogram entities of various dimensionality. Single histograms are declared as pointers, one-dimensional arrays of histograms are declared as std::vector of pointers, multi-dimensional arrays of histograms are declared as C-arrays of pointers.

In the simplest case (single histograms), a declaration reads like this:

     TH2F* fh2_Califa_coinPhi;

In the corresponding cxx file, we run over this histogram in different places.

620 lines deep in the Init() method, it is initialized:

fh2_Califa_coinPhi =
        R3B::root_owned<TH2F>("fh2_Califa_phi_correlations", "Califa phi correlations", 600, -190, 190, 600, -190, 190);

A few more lines are then spent to create a TCanvas, set axis labels and draw it using "COLZ".

The next place we find it is an exceptionally short method, Reset(), which clears all histograms. Our histogram is mentioned 94 lines into that method:

        fh2_Califa_coinPhi->Reset();

The final place it is used is where it is actually filled, 302 lines within the Exec method.

                fh2_Califa_coinPhi->Fill(califa_phi[i1], califa_phi[i2]);

The lack of a destructor is by design, and the fact that the pointer is uninitialized between object construction and Init() could be fixed fairly easily by using default member initalizers. Also, iterating through all the histograms for Reset() could be avoided fairly easily by adding any allocated histogram to a vector. Still, that leaves three different places where the histogram pointer appears, which is about two too many.

Prior work

Early attempts by me to generate a single definition of a histogram included rewriting more than a decade ago the Califa GO4 using a struct HistogramAxis which contained information on range, binning, axis labeling and a function pointer to extract the value to be histogrammed. You would pass one or more of these axis objects to the constructor of your histogramming class, and the class would histogram just the quantities you were interested in.

A few years later, I implemented something similar for R3BRoot collections. Only this time, it was based on template metaprogramming, which meant that this could have been turned into something optimizable to zero runtime overhead. Alas, template error messages are generally not very user friendly. Still, probably the most fun I ever had with a C++ compiler.

This time, I am aiming for something a bit more practical.

Design choices

Why Python?

I based this on pyh101, a recent framework for handling ucesb unpacker data from Python. I am of the firm opinion that C++ is not a suitable language for data analysis within R3B. While there are certainly people in the R3B collaboration whose C++ I would trust as much as anyones (which is not as faint a praise as you might think it is), I am of the firm opinion that most the R3BRoot contributors do not have an adequate knowledge of C++, and the review process in basically non-existent. To be fair, "adequate knowledge of C++" is a big ask -- it covers everything from "don't put a leading zero in front of your integer constants unless you like octal numbers", constness, templates, how to get sane multi-dimensional arrays, exception handling, and a ton of similarly esoteric topics. Take object ownership -- I have seen a ton of code to the effect of:

R3bFoo::Init()
{
   fBarCA=dynamic_cast<...>(FairRootManager::Instance().GetObject("Bar");
}

R3BFoo::~R3BFoo()
{
   if (fBarCA)
      delete fBarCA;
}

Whoever writes such code clearly has an insufficient understanding of object ownership. FairRootManager::GetObject() will not return a unique pointer per call just to do as you please with, it will return a pointer to an object which is also used in other places, and you have no business deleting it. Cargo-cult programming on the level of "I have seen others delete their pointer members in the destructor, hence I should call delete on all of my member pointers" does not cut it. The unneccessary if is just adding insult to injury.

So as I was starting from the scratch in any case, I recognized that getting contributors to the point where they can contribute safe Python is much easier than getting them to the point where they can contribute safe C++. Less stuff to unlearn, for one thing.

As far as alternatives to C++ go, Python has some clear advantages:

It is very general purpose. From running I2C on a RasPi over running numerical algorithms from numpy/scipy to neural networking, it can be used for all kinds of purposes.
It is a straightforward imperative language, which is what people are most likely to have experience with.
It offers a well-documented interface for implementing modules in C.
ROOT offers Python bindings. While the merits of ROOT are debatable (it improved a lot with ROOT6, but a lot of outdated weird design choices are grandfathered in), I did not want change everything at once. I have long felt that PyROOT is the preferable way to interact with ROOT. ROOT's devil-may-care attitude towards type safety ("anything is a TObject") plays much better with Python's duck typing than with C++'s strict typing requirements. All the dynamic_cast operators are just not very straightforward for most users.

But what about runtime speed?

This would be a good argument if we spent a PhD salary on compute. From what I have seen, the R3BRoot analysis code is not heavily optimized. I have fixed code where string processing was used to look up Califa crystals in an event loop. I have seen code performed an operation in O(n**2) which was clearly O(n*log(n)) in complexity -- not because the author had argued that n would be small, but because they simply did not know the relevant STL data types.

Type safety?

One of the very few design decisions of FairRoot I agree with is not insisting on static type safety for data passed between FairTasks. Doing otherwise would basically force people to use a global struct which contains all of their datatypes as fields and refer to these members by name.

I will grant that type errors in Python can at times by annoying. However, there is nothing stopping anyone from using type annotations.

Why ROOT

I am using ROOT for plotting and histogramming. Granted, this is one of the areas where ROOT is especially weird (a TH2I is a TH1, no separation of data and visual representation, the coordinate axes are a property of a histogram to the point that TGraph will create a histogram just to plot them, while which axes are logscale is a property of the pad, etc), but it is a weirdness most of us have used for half our lives.

Because of this rather limited scope, nothing would someone to replace online_base.py with a backend which uses matplotlib and jupyter or whatever.

Why JSROOT

There are clear advantages to running as a web server. JSROOT mostly does what I need it to do. Again, I do not have a deep ideological conviction here, if you write interface code for another backend I might switch.

Totally unbiased comparison table

Comparison	FR/R3BRoot	pyh101/pyjsroot
Scripting language	ROOT Cling	Python
Compiled language	C++	C/C++
Adding compiled components	Somewhat painful (RootClassDef, LinkDef.h, `LD_LIBRARY_PATH`)	Painful (CPython modules)
Adding interpreted code	Somewhat painful (C++)	Easy (Python)
Event serialization	TTree of TClonesArrays of custom TObjects	Not yet (pickle?)
Parameter serialization	FairParams, based on custom TObjects	json or yaml probably
Support for monte carlo simulation	TVirtualMC (with the backend being Geant4 99% of the time)	Nope! ("do one thing, do it well") Use Geant4, write json
New classes required whenever	... you add a new detector	... you want new functionality (e.g. a new calibration procedure)
Support for detectors	All of them!	Whatever you add
LOC	114k (FairRoot) + 280k (R3BRoot) + 10k (R3BParams)	4600 (ext_data_client) 1300 (pyh101) + 1100 (jsonline)
Most common pitfall	forgot a change in a copy+rename orgy	lambda.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

pyjsroot -- a jsroot server working with pyh101.

Motivation

Present state of R3B analysis software

Histogramming

Prior work

Design choices

Why Python?

But what about runtime speed?

Type safety?

Why ROOT

Why JSROOT

Totally unbiased comparison table

FilesExpand file tree

soapbox.md

Latest commit

History

soapbox.md

File metadata and controls

pyjsroot -- a jsroot server working with pyh101.

Motivation

Present state of R3B analysis software

Histogramming

Prior work

Design choices

Why Python?

But what about runtime speed?

Type safety?

Why ROOT

Why JSROOT

Totally unbiased comparison table