Skip to content

[CPBench] Benchmarking Wrapper#1168

Open
lehendo wants to merge 2 commits into
sunlabuiuc:masterfrom
lehendo:cpbench
Open

[CPBench] Benchmarking Wrapper#1168
lehendo wants to merge 2 commits into
sunlabuiuc:masterfrom
lehendo:cpbench

Conversation

@lehendo

@lehendo lehendo commented Jun 26, 2026

Copy link
Copy Markdown
Collaborator

Benchmarking wrapper for the CPBench project.

Usage:

List what's available:

python benchmarks/cpbench/benchmark.py --list

Full benchmark run (trains from scratch, tests at α = 0.01, 0.05, 0.10, 0.20):

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method label \
    --data-path /path/to/ISRUC-I

Skip training with a saved checkpoint:

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method label \
    --data-path /path/to/ISRUC-I \
    --checkpoint /path/to/model.pth

Quick smoke-test (uses a tiny data subset, runs only α = 0.10):

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method base \
    --data-path /path/to/ISRUC-I \
    --dev

Save results to JSON for aggregating across runs:

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method label \
    --data-path /path/to/ISRUC-I \
    --output results/isruc_label.json

@fbonc

fbonc commented Jul 3, 2026

Copy link
Copy Markdown
Contributor

Hello ac!

Main things I ran into, as requested:

1. The wrapper does not fully match current PyHealth APIs

A few crashes came from the wrapper assuming all datasets/tasks/models are built the same way.

E.g., the wrapper tries to do something like dataset.set_task(task_fn) where task_fn is a separate function that turns raw records into ML samples. That does not seem to work uniformly (anymore?).

SleepEDF needed the default task path instead: dataset.set_task()

ISRUC failed because its dataset object did not have set_task at all:
AttributeError: 'ISRUCDataset' object has no attribute 'set_task'
ISRUC is also listed as supported, but crashes due to this.

SparcNet also failed because the wrapper passed arguments that current SparcNet does not accept:
TypeError: SparcNet.__init__() got an unexpected keyword argument 'feature_keys'

So I think the main issue is that the wrapper is written against a mixed PyHealth interface which not all datasets and models use. Note: didn't test against all interfaces so there's likely more - not a comprehensive list above.

3. SleepEDF preprocessing hit a data edge case

SleepEDF crashed because one recording did not contain “Sleep stage 4”:
ValueError: No matching events found for Sleep stage 4

Adding on_missing="ignore" to the MNE Epochs(...) call let it continue.

4. The script prints results but can fail when saving them

After local patches, I got a capped SleepEDF run to print a result table, but then saving failed because the output folder did not exist:

FileNotFoundError: 'results/sleepedf_label_smoke.json'

Probably just needs to create the parent directory before writing the JSON.

5. Maybe a test mode would be useful?

SleepEDF processed around 415k samples, so every wrapper bug took a while to reach. I added a local cap like 1000 samples per split, which made it much easier to debug the full path quickly.

7. The PR mentions checkpoint loading, but I did not see it in the CLI

Would be helpful because CP debugging should ideally be possible without retraining the
model every time.


Happy to help with the wrapper if you'd like. Peace.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants