Description
We need nexus package/repo to support the following:
- Registering benchmark experiment packages
- Registering benchmarks on a per model basis
Registering means (a) putting information in correct location (b) validating that information (c) merging to main.
see benchmark system requirements for terminology
Motivation
These requirements are set out in the benchmarking requirements
Proposed Solution
Describe the solution you'd like to see implemented.
See [benchmark system design] for more details.
- Implement mechanism to register benchmark packages with nexus
a. The nexus.yml has a section that lists benchmark packages it requires with the experiments. This is a list of package-names (pypi), GitHub repo URLs or relative file-paths (python packages in repo) OR this is just a file called requirements_text.txt
b. By some mechanism its checked that the list of benchmark packages can be installed together
- Implement mechanism to register benchmarks with nexus
a. Extend the nexus model dir spec to include sub-dir for bencmarks
- Each benchmark is in its own dir and at minimum is an ado space.yaml
- Optional: Extend nexus model.yaml with the names/locations of these benchmarks
b. Extend nexus cli so it can validate the package structure
c. By some mechanism check that the benchmark space.yaml is valid
The mechanisms for 1b and 2c must be related as 1b is required for 2c.
Simple mechanism:
- `nexus validate package NAME
- uv pip install -r packages/NAME/requirements_test.txt -> If can't be installed benchmark package registration fails AND benchmark registration is not possible
- for each space.yml under packages/NAME/
- ado create -f space space.yaml --dry-run -> If any validation fails you can't register the benchmark
Additional Context
A key point is that ado provides mechanism for reading benchmark packages and finding experiments - however this relies on python entrypoints (how ado discovers the packages), which relies on the package being installed. There is no external list of experiments that can be read as this list can be dynamically generated from the decorator of experiments or by an actuator.
The issue is the package installation - to list all experiment in all nexus packages, they all must installed, and hence not have dependency conflicts.
Similarly validating a space using a particular experiment requires that particular experiment is installed.
Description
We need nexus package/repo to support the following:
Registering means (a) putting information in correct location (b) validating that information (c) merging to main.
see benchmark system requirements for terminology
Motivation
These requirements are set out in the benchmarking requirements
Proposed Solution
Describe the solution you'd like to see implemented.
See [benchmark system design] for more details.
a. The nexus.yml has a section that lists benchmark packages it requires with the experiments. This is a list of package-names (pypi), GitHub repo URLs or relative file-paths (python packages in repo) OR this is just a file called requirements_text.txt
b. By some mechanism its checked that the list of benchmark packages can be installed together
a. Extend the nexus model dir spec to include sub-dir for bencmarks
- Each benchmark is in its own dir and at minimum is an ado space.yaml
- Optional: Extend nexus model.yaml with the names/locations of these benchmarks
b. Extend nexus cli so it can validate the package structure
c. By some mechanism check that the benchmark space.yaml is valid
The mechanisms for 1b and 2c must be related as 1b is required for 2c.
Simple mechanism:
- ado create -f space space.yaml --dry-run -> If any validation fails you can't register the benchmark
Additional Context
A key point is that ado provides mechanism for reading benchmark packages and finding experiments - however this relies on python entrypoints (how ado discovers the packages), which relies on the package being installed. There is no external list of experiments that can be read as this list can be dynamically generated from the decorator of experiments or by an actuator.
The issue is the package installation - to list all experiment in all nexus packages, they all must installed, and hence not have dependency conflicts.
Similarly validating a space using a particular experiment requires that particular experiment is installed.