Update normalization parameters and add estimator params validation by avolkov-intel · Pull Request #210 · IntelPython/scikit-learn_bench

avolkov-intel · 2026-05-06T08:38:25Z

Description

Add different normalization options
Remove implicit normalization from loaders
Add estimator parameters validation

Estimator parameters validation can be useful in case you want to override some parameter like n_jobs using -p algorithm:estimator_params:n_jobs=64 for the benchmarks. Currently it would fail since some estimators don't support n_jobs and you would need to run the benchmarks separately. In this approach this parameter will be simply ignored by estimators that don't support it and warning will be shown.

Checklist:

Completeness and readability

I have commented my code, particularly in hard-to-understand areas.
I have updated the documentation to reflect the changes or created a separate PR with updates and provided its number in the description, if necessary.
Git commit message contains an appropriate signed-off-by string (see CONTRIBUTING.md for details).
I have resolved any merge conflicts that might occur with the base branch.

Testing

I have run it locally and tested the changes extensively.
All CI jobs are green or I have provided justification why they aren't.
I have extended testing suite if new functionality was introduced in this PR.

ethanglaser · 2026-05-06T18:27:43Z

                "data": {
                    "dataset": "hepmass",
-                    "split_kwargs": { "train_size": 0.1, "test_size": null }
+                    "split_kwargs": { "train_size": 0.1, "test_size": null },


Seems like this is the only case where benchmark behavior changes - is it intended?

I think it was done for a reason but let me check the convergence for both options

Essentially, there's no effect on the result: accuracy and number of iterations stays the same

ethanglaser · 2026-05-06T18:28:19Z

+                    "dataset" : "cifar",
+                    "split_kwargs": { "ignore" : true },


Suggested change

"dataset" : "cifar",

"split_kwargs": { "ignore" : true },

"dataset": "cifar",

"split_kwargs": { "ignore": true },

david-cortes-intel · 2026-05-08T12:14:51Z

+        else:
+            logger.warning(f'Unknown "{normalize}" normalization type.')
+        if scaler is not None:
+            return pd.DataFrame(scaler.fit_transform(x), columns=x.columns, index=x.index)


Wouldn't this make it ignore return_type == np.ndarray?

Currently it works correctly for all return_types as intermediate data is always represented in pandas format. However, this conversion is indeed redundant if return_type is not a pandas dataframe

Is that because it then goes through train_test_split? Isn't that step optional?

ethanglaser

LGTM once David's comments are resolved. We should add a json linter step to CI and unify formatting in sklbench configs.

avolkov-intel added 4 commits May 4, 2026 22:28

Update preprocessing args

567b241

Update scaling logic

fd6e4a5

Fix scaling

ac310c1

Add gisette normalization in SVM config

7df7818

avolkov-intel requested review from david-cortes-intel and ethanglaser as code owners May 6, 2026 08:38

Add estimator parameters filter

5536db7

avolkov-intel changed the title ~~Update normalization parameters~~ Update normalization parameters and add estimator params validation May 6, 2026

avolkov-intel added bug Something isn't working extend Extend benchmarks labels May 6, 2026

Code format

a2a7515

ethanglaser reviewed May 6, 2026

View reviewed changes

david-cortes-intel reviewed May 8, 2026

View reviewed changes

Minor fixes

a627a86

ethanglaser approved these changes May 12, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Update normalization parameters and add estimator params validation#210

Update normalization parameters and add estimator params validation#210
avolkov-intel wants to merge 7 commits into
IntelPython:mainfrom
avolkov-intel:dev/anatolyv-normalize-fix

avolkov-intel commented May 6, 2026 •

edited

Loading

Uh oh!

ethanglaser May 6, 2026

Uh oh!

avolkov-intel May 7, 2026

Uh oh!

avolkov-intel May 11, 2026 •

edited

Loading

Uh oh!

ethanglaser May 6, 2026

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-cortes-intel May 8, 2026

Uh oh!

avolkov-intel May 11, 2026

Uh oh!

david-cortes-intel May 11, 2026

Uh oh!

ethanglaser left a comment •

edited

Loading

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

Conversation

avolkov-intel commented May 6, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Description

Uh oh!

ethanglaser May 6, 2026

Choose a reason for hiding this comment

Uh oh!

avolkov-intel May 7, 2026

Choose a reason for hiding this comment

Uh oh!

avolkov-intel May 11, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

ethanglaser May 6, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

david-cortes-intel May 8, 2026

Choose a reason for hiding this comment

Uh oh!

avolkov-intel May 11, 2026

Choose a reason for hiding this comment

Uh oh!

david-cortes-intel May 11, 2026

Choose a reason for hiding this comment

Uh oh!

ethanglaser left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

avolkov-intel commented May 6, 2026 •

edited

Loading

avolkov-intel May 11, 2026 •

edited

Loading

ethanglaser left a comment •

edited

Loading