[CPBench] Benchmarking Wrapper by lehendo · Pull Request #1168 · sunlabuiuc/PyHealth

lehendo · 2026-06-26T08:23:58Z

Benchmarking wrapper for the CPBench project.

Usage:

List what's available:

python benchmarks/cpbench/benchmark.py --list

Full benchmark run (trains from scratch, tests at α = 0.01, 0.05, 0.10, 0.20):

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method label \
    --data-path /path/to/ISRUC-I

Skip training with a saved checkpoint:

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method label \
    --data-path /path/to/ISRUC-I \
    --checkpoint /path/to/model.pth

Quick smoke-test (uses a tiny data subset, runs only α = 0.10):

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method base \
    --data-path /path/to/ISRUC-I \
    --dev

Save results to JSON for aggregating across runs:

python benchmarks/cpbench/benchmark.py \
    --task sleep_staging_isruc \
    --method label \
    --data-path /path/to/ISRUC-I \
    --output results/isruc_label.json

fbonc · 2026-07-03T22:15:44Z

Hello ac!

Main things I ran into, as requested:

1. The wrapper does not fully match current PyHealth APIs

A few crashes came from the wrapper assuming all datasets/tasks/models are built the same way.

E.g., the wrapper tries to do something like dataset.set_task(task_fn) where task_fn is a separate function that turns raw records into ML samples. That does not seem to work uniformly (anymore?).

SleepEDF needed the default task path instead: dataset.set_task()

ISRUC failed because its dataset object did not have set_task at all:
AttributeError: 'ISRUCDataset' object has no attribute 'set_task'
ISRUC is also listed as supported, but crashes due to this.

SparcNet also failed because the wrapper passed arguments that current SparcNet does not accept:
TypeError: SparcNet.__init__() got an unexpected keyword argument 'feature_keys'

So I think the main issue is that the wrapper is written against a mixed PyHealth interface which not all datasets and models use. Note: didn't test against all interfaces so there's likely more - not a comprehensive list above.

3. SleepEDF preprocessing hit a data edge case

SleepEDF crashed because one recording did not contain “Sleep stage 4”:
ValueError: No matching events found for Sleep stage 4

Adding on_missing="ignore" to the MNE Epochs(...) call let it continue.

4. The script prints results but can fail when saving them

After local patches, I got a capped SleepEDF run to print a result table, but then saving failed because the output folder did not exist:

FileNotFoundError: 'results/sleepedf_label_smoke.json'

Probably just needs to create the parent directory before writing the JSON.

5. Maybe a test mode would be useful?

SleepEDF processed around 415k samples, so every wrapper bug took a while to reach. I added a local cap like 1000 samples per split, which made it much easier to debug the full path quickly.

7. The PR mentions checkpoint loading, but I did not see it in the CLI

Would be helpful because CP debugging should ideally be possible without retraining the
model every time.

Happy to help with the wrapper if you'd like. Peace.

New benchmarking wrapper

a968b86

lehendo assigned siddharthal and jhnwu3 Jun 26, 2026

Add multiple seeds and 0.01 alpha

4db5e3a

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

[CPBench] Benchmarking Wrapper#1168

[CPBench] Benchmarking Wrapper#1168
lehendo wants to merge 2 commits into
sunlabuiuc:masterfrom
lehendo:cpbench

lehendo commented Jun 26, 2026

Uh oh!

fbonc commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Uh oh!

Conversation

lehendo commented Jun 26, 2026

Uh oh!

fbonc commented Jul 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants