Update normalization parameters and add estimator params validation#210
Update normalization parameters and add estimator params validation#210avolkov-intel wants to merge 7 commits into
Conversation
| "data": { | ||
| "dataset": "hepmass", | ||
| "split_kwargs": { "train_size": 0.1, "test_size": null } | ||
| "split_kwargs": { "train_size": 0.1, "test_size": null }, |
There was a problem hiding this comment.
Seems like this is the only case where benchmark behavior changes - is it intended?
There was a problem hiding this comment.
I think it was done for a reason but let me check the convergence for both options
There was a problem hiding this comment.
Essentially, there's no effect on the result: accuracy and number of iterations stays the same
| "dataset" : "cifar", | ||
| "split_kwargs": { "ignore" : true }, |
There was a problem hiding this comment.
| "dataset" : "cifar", | |
| "split_kwargs": { "ignore" : true }, | |
| "dataset": "cifar", | |
| "split_kwargs": { "ignore": true }, |
| else: | ||
| logger.warning(f'Unknown "{normalize}" normalization type.') | ||
| if scaler is not None: | ||
| return pd.DataFrame(scaler.fit_transform(x), columns=x.columns, index=x.index) |
There was a problem hiding this comment.
Wouldn't this make it ignore return_type == np.ndarray?
There was a problem hiding this comment.
Currently it works correctly for all return_types as intermediate data is always represented in pandas format. However, this conversion is indeed redundant if return_type is not a pandas dataframe
There was a problem hiding this comment.
Is that because it then goes through train_test_split? Isn't that step optional?
Description
Estimator parameters validation can be useful in case you want to override some parameter like n_jobs using -p algorithm:estimator_params:n_jobs=64 for the benchmarks. Currently it would fail since some estimators don't support n_jobs and you would need to run the benchmarks separately. In this approach this parameter will be simply ignored by estimators that don't support it and warning will be shown.
Checklist:
Completeness and readability
Testing