Skip to content

Add _CellView helper subclass to Vector datastructure for Imaging Infrastructure needs#179

Closed
darshan-mali wants to merge 2 commits intoelectronmicroscopy:devfrom
darshan-mali:imaging3
Closed

Add _CellView helper subclass to Vector datastructure for Imaging Infrastructure needs#179
darshan-mali wants to merge 2 commits intoelectronmicroscopy:devfrom
darshan-mali:imaging3

Conversation

@darshan-mali
Copy link
Collaborator

@darshan-mali darshan-mali commented Feb 21, 2026

What does this PR do?

Implementation of the helper subclass _CellView for Vector.
Enables accessing data in Vector class by each cell in addition to by each Field (current default).
This is an essential Infrastructure change for automating imaging analysis.

For example, if you create a Vector datastructure as follows:

atoms = Vector.from_shape(
    shape = (2,),
    fields = ["x", "y", "a", "b"],
    units = ["px", "px", "ind", "ind"]
)

The current infrastructure only allows data access based on each Field only as shown:
x_data = atoms["x"]

This PR enables access via Cells and Field access for Cells as follows:

atoms_type0 = atoms[0]
atoms_type0_x = atoms[0]["x"]

This is very important when dealing with large number of atoms and their data, especially for doing automated polarization measurements. More details regarding the same have been explained in the notebook attached.

Also includes 1 basic pytest for _CellView.

Modified files

  • src/quantem/core/datastructures/vector.py - Added the _CellView helper subclass. Updated __getitem__() to return _CellView object based on indexing used. Also added shape validation for from_shape() method.
  • tests/datastructures/test_vector.py - Added basic pytest to test _CellView functionality.

Brief example can be found in this notebook:
Vector_example.ipynb

What should the reviewer(s) do?

This is a PR into core
Ensure nothing else breaks because of this

  • No other files in core are affected
  • This is not an infrastructural liability
  • Functionality and implementation of _CellView is correct and as expected (Notebook provided with simple example).

Checklist items:

  • This PR affects internal functionality only (no user-facing change).

Reviewer checklist

  • Tests pass and are appropriate for the changes
  • Documentation and examples are sufficient and clear
  • Imported or adapted code has a compatible license
  • The implementation matches the stated intent of the contribution

@darshan-mali darshan-mali changed the title Add _CellView to Vector Add _CellView helper subclass to Vector datastructure for Imgaing Infrastructure needs Feb 21, 2026
@darshan-mali darshan-mali changed the title Add _CellView helper subclass to Vector datastructure for Imgaing Infrastructure needs Add _CellView helper subclass to Vector datastructure for Imaging Infrastructure needs Feb 21, 2026
@arthurmccray
Copy link
Collaborator

@cophus Okay so I feel a little bad for only looking at Vector now, as I know this code was originally written many months ago, but I do have some high level questions about it.

Long story short, Vector isn't behaving in the way I would expect it to. This could be due to my expectations being wrong (I think adding a full tutorial notebook will be helpful) but still I think some things could be done more cleanly.

My general comments:

  • very high level, isn't Vector just a dataframe? It feels kind of like we're just reimplementing a list of pandas or polars dataframes.
    • A simpler execution that I think achieves all of the goals of Vector is a wrapper class built around a single dataframe (e.g. using Polars) that actually contains all the data. "Fixed" dimensions (e.g. atom species) would just be another column in this scheme, and if you want to maintain indexing with square brackets we would just adjust the __getitem__ method
    • Is the purpose of Vector that we don't add any additional dependencies, or does it add features not available in existing libraries? It would be good to clarify our goals at the beginning.
  • We do need a proper Vector tutorial that can be added to the core module of the quantem-tutorials repo (I think Colin has an old version of a notebook that could be adapted for this purpose)
  • I find the way FieldView and CellView work to be rather confusing. see my screenshot below
image
  • The tutorial should have a few additional features
    • Make more clear the distinction between fixed and ragged dimensions, how they're specified, etc.
    • appending and popping vectors
    • adding new fields, either as empty columns or as calculated from other existing fields
  • do the fixed dimensions not support names? I think they probably should if we're allowing fields for the "columns"
  • we should add a __len__ method that gives the fixed dimensions for easy iterating
  • Shouldn't Vector.shape -> (num_fixed_dism, num_ragged_dims)

@darshan-mali
Copy link
Collaborator Author

Closing this in lieu of PR #184

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants