Skip to content

feat: extend nexus package to support registration of benchmark package and benchmarks #68

@michael-johnston

Description

@michael-johnston

Description

We need nexus package/repo to support the following:

  1. Registering benchmark experiment packages
  2. Registering benchmarks on a per model basis

Registering means (a) putting information in correct location (b) validating that information (c) merging to main.

see benchmark system requirements for terminology

Motivation

These requirements are set out in the benchmarking requirements

Proposed Solution

Describe the solution you'd like to see implemented.

See [benchmark system design] for more details.

  1. Implement mechanism to register benchmark packages with nexus
    a. The nexus.yml has a section that lists benchmark packages it requires with the experiments. This is a list of package-names (pypi), GitHub repo URLs or relative file-paths (python packages in repo) OR this is just a file called requirements_text.txt
    b. By some mechanism its checked that the list of benchmark packages can be installed together
  2. Implement mechanism to register benchmarks with nexus
    a. Extend the nexus model dir spec to include sub-dir for bencmarks
    - Each benchmark is in its own dir and at minimum is an ado space.yaml
    - Optional: Extend nexus model.yaml with the names/locations of these benchmarks
    b. Extend nexus cli so it can validate the package structure
    c. By some mechanism check that the benchmark space.yaml is valid

The mechanisms for 1b and 2c must be related as 1b is required for 2c.

Simple mechanism:

  1. `nexus validate package NAME
    • uv pip install -r packages/NAME/requirements_test.txt -> If can't be installed benchmark package registration fails AND benchmark registration is not possible
    • for each space.yml under packages/NAME/
      - ado create -f space space.yaml --dry-run -> If any validation fails you can't register the benchmark

Additional Context

A key point is that ado provides mechanism for reading benchmark packages and finding experiments - however this relies on python entrypoints (how ado discovers the packages), which relies on the package being installed. There is no external list of experiments that can be read as this list can be dynamically generated from the decorator of experiments or by an actuator.

The issue is the package installation - to list all experiment in all nexus packages, they all must installed, and hence not have dependency conflicts.

Similarly validating a space using a particular experiment requires that particular experiment is installed.

Metadata

Metadata

Labels

enhancementNew feature or request

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions