Skip to content

[Question] How to know the data and feature preprocessing used in the ensemble? #1633

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
TuanDTr opened this issue Dec 19, 2022 · 2 comments

Comments

@TuanDTr
Copy link

TuanDTr commented Dec 19, 2022

Hi, the method AutoSklearnClassifier().show_models() displays the models found in the ensemble. I wonder if it is possible to know exactly which data or feature preprocessing steps have been done before training the model. The method show_models only gives only the object:

{'model_id': 2, 
'rank': 1, 
'cost': 0.04255319148936165, 
'ensemble_weight': 0.04, 
'data_preprocessor': <autosklearn.pipeline.components.data_preprocessing.DataPreprocessorChoice object at 0x7f704fb2dee0>, 
'balancing': Balancing(random_state=1), 
'feature_preprocessor': <autosklearn.pipeline.components.feature_preprocessing.FeaturePreprocessorChoice object at 0x7f70a7e4e7f0>,
'classifier': <autosklearn.pipeline.components.classification.ClassifierChoice object at 0x7f70a7e4e1c0>, 
'sklearn_classifier': RandomForestClassifier(max_features=5, n_estimators=512, n_jobs=1,
                       random_state=1, warm_start=True)}

and it is not clear to know which steps they are. Is it possible to get the preprocessing steps in such a case?

Many thanks!

@eddiebergman
Copy link
Contributor

Sorry for the delay,

You could also use estimator.leaderboard(detailed=True) to get a pandas version which gives the str of the choices made. However data_preprocessor encompassing many sub things that are chosen is kind of a pain point.

You can use this most likely:

models = estimator.show_models()
model = ...  # Select one

model_id = model["model_id"]

runhistory = estimator.automl_.runhistory_
full_config = runhistory.ids_config[model_id]

@bonfire666666
Copy link

Sorry for the delay,

You could also use estimator.leaderboard(detailed=True) to get a pandas version which gives the str of the choices made. However data_preprocessor encompassing many sub things that are chosen is kind of a pain point.

You can use this most likely:

models = estimator.show_models()
model = ...  # Select one

model_id = model["model_id"]

runhistory = estimator.automl_.runhistory_
full_config = runhistory.ids_config[model_id]

I dont think the model_id matches the id in ids_config

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants