[CPBench] Benchmarking Wrapper#1168
Conversation
|
Hello ac! Main things I ran into, as requested: 1. The wrapper does not fully match current PyHealth APIs A few crashes came from the wrapper assuming all datasets/tasks/models are built the same way. E.g., the wrapper tries to do something like SleepEDF needed the default task path instead: ISRUC failed because its dataset object did not have set_task at all: SparcNet also failed because the wrapper passed arguments that current SparcNet does not accept: So I think the main issue is that the wrapper is written against a mixed PyHealth interface which not all datasets and models use. Note: didn't test against all interfaces so there's likely more - not a comprehensive list above. 3. SleepEDF preprocessing hit a data edge case SleepEDF crashed because one recording did not contain “Sleep stage 4”: Adding 4. The script prints results but can fail when saving them After local patches, I got a capped SleepEDF run to print a result table, but then saving failed because the output folder did not exist: FileNotFoundError: 'results/sleepedf_label_smoke.json' Probably just needs to create the parent directory before writing the JSON. 5. Maybe a test mode would be useful? SleepEDF processed around 415k samples, so every wrapper bug took a while to reach. I added a local cap like 1000 samples per split, which made it much easier to debug the full path quickly. 7. The PR mentions checkpoint loading, but I did not see it in the CLI Would be helpful because CP debugging should ideally be possible without retraining the Happy to help with the wrapper if you'd like. Peace. |
Benchmarking wrapper for the CPBench project.
Usage:
List what's available:
Full benchmark run (trains from scratch, tests at α = 0.01, 0.05, 0.10, 0.20):
python benchmarks/cpbench/benchmark.py \ --task sleep_staging_isruc \ --method label \ --data-path /path/to/ISRUC-ISkip training with a saved checkpoint:
python benchmarks/cpbench/benchmark.py \ --task sleep_staging_isruc \ --method label \ --data-path /path/to/ISRUC-I \ --checkpoint /path/to/model.pthQuick smoke-test (uses a tiny data subset, runs only α = 0.10):
python benchmarks/cpbench/benchmark.py \ --task sleep_staging_isruc \ --method base \ --data-path /path/to/ISRUC-I \ --devSave results to JSON for aggregating across runs:
python benchmarks/cpbench/benchmark.py \ --task sleep_staging_isruc \ --method label \ --data-path /path/to/ISRUC-I \ --output results/isruc_label.json