Skip to content

Model Size is big in auto sklearn #1359

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
shabir1 opened this issue Dec 27, 2021 · 7 comments
Closed

Model Size is big in auto sklearn #1359

shabir1 opened this issue Dec 27, 2021 · 7 comments
Labels

Comments

@shabir1
Copy link

shabir1 commented Dec 27, 2021

Model Size is big in auto sklearn

Auto Sklearn model size is big with respect to sklearn, Below are the examples:

1. With ensemble_size=30
AutoSklearnRegressor(
                    ensemble_nbest=32, 
                    ensemble_size=30,
                     include={'data_preprocessor': ['NoPreprocessing'],
                              'feature_preprocessor': ['no_preprocessing']},
                     max_models_on_disc=32, per_run_time_limit=100,
                     time_left_for_this_task=350)

Model size: 789MB

2. With ensemble_size=10
AutoSklearnRegressor( 
                               ensemble_nbest=12, 
                               ensemble_size=10,
                     include={'data_preprocessor': ['NoPreprocessing'],
                              'feature_preprocessor': ['no_preprocessing']},
                     max_models_on_disc=12, per_run_time_limit=100,
                     time_left_for_this_task=350)

Model size: 786MB

3. With ensemble_size=1
AutoSklearnRegressor(
                             ensemble_nbest=3, 
                             ensemble_size=1,
                     include={'data_preprocessor': ['NoPreprocessing'],
                              'feature_preprocessor': ['no_preprocessing']},
                     max_models_on_disc=3, per_run_time_limit=100,
                     time_left_for_this_task=350)
Selected Model:  Random Forest
Model size: 777MB

4. Run Sklearn Model
ExtraTreesRegressor(n_estimators=30,  random_state=0)
Model size: 58MB

5. Run Sklearn Model (Same as the AutoSklearnRegressor with ensemble size 1., Run the same selected model with the same parameters but different model sizes with a huge difference 122MB and 777MB)
RandomForestRegressor(bootstrap=True,   criterion='mse' )
Model size: 122MB

I run autoskearn without feature or data preprocessing but still model size is very huge.
If it is due to ensemble size then I tried with different values of ensemble size 30, 10, 1 but the model size is almost the same, Why?

@eddiebergman
Copy link
Contributor

This is likely due to all the imports, saved predictions and everything else we use for optimization bundled into the object. I would not use the full auto-sklearn model in production at the moment and instead try to export or retrain the found models.

@shabir1
Copy link
Author

shabir1 commented Jan 10, 2022

@eddiebergman You mean I will get the best configurations and create an sklearn ensemble model out of it.

@eddiebergman
Copy link
Contributor

There are ways to do that by extracting out the models, this is made easier in a recent PR by @userfindingself in #1321. This is available in the development branch, otherwise you can view the code there to suit your needs. We would like an export option down the road to eventually export more production ready models.

@shabir1
Copy link
Author

shabir1 commented Jan 10, 2022

@eddiebergman In the next version of auto-sklearn can we get the final model with small size for prediction.

@eddiebergman
Copy link
Contributor

We are reworking some internals, this will likely not be a feature in the next release, apologies.

@shabir1
Copy link
Author

shabir1 commented Jan 10, 2022

@eddiebergman Thank you sir for your quick reponse

@shabir1 shabir1 closed this as completed Jan 10, 2022
@mfeurer
Copy link
Contributor

mfeurer commented Jan 10, 2022

One additional note: Auto-sklearn uses 512 trees, while scikit-learn by default only uses 100 trees. This explains why the models are in the ballbark of > 500MB.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

3 participants