Skip to content

Changes show_models() function to return a dictionary of models in ensemble #1321

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Dec 25, 2021
55 changes: 46 additions & 9 deletions autosklearn/automl.py
Original file line number Diff line number Diff line change
Expand Up @@ -1834,15 +1834,52 @@ def get_models_with_weights(self):
return self.ensemble_.get_models_with_weights(self.models_)

def show_models(self):
models_with_weights = self.get_models_with_weights()

with io.StringIO() as sio:
sio.write("[")
for weight, model in models_with_weights:
sio.write("(%f, %s),\n" % (weight, model))
sio.write("]")

return sio.getvalue()
""" Returns a dictionary containing models and their information
where model_id/run_id is the key """

ensemble_dict={}

def has_key(rv, key):
return rv.additional_info and key in rv.additional_info

table_dict={}
for rkey, rval in self.runhistory_.data.items():
if has_key(rval, 'num_run'):
table_dict[rval.additional_info['num_run']]={
'model_id': rval.additional_info['num_run'],
'cost': rval.cost}

for i, weight in enumerate(self.ensemble_.weights_):
(_, model_id, _) = self.ensemble_.identifiers_[i]
table_dict[model_id]['ensemble_weight']=weight

table=pd.DataFrame.from_dict(table_dict, orient='index')
table.sort_values(by='cost', inplace=True)
table['rank']=range(1,len(table)+1)

for (_, model_id, _), model in self.models_.items():
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

self.models_ may be empty if a user specifies using cross validation, in which case self.cv_models_ is set. We want to unify them into one thing but at the moment that doesn't happen.

self.cv_models_

For this a test probably needs to be created, I will elaborate on creating a test in a moment.

Copy link
Contributor Author

@sagar-kaushik sagar-kaushik Nov 27, 2021

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Did not know that. So do you want me to put some condition to check for this and use either self.models_ or self.cv_models_ inside the loop, depending on whichever is non-empty, or should I just create the test for now?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, you can either check if the resampling_strategy is cv or more useful is probably just check which one is empty.

model_dict={} #Empty model dictionary

#Inserting rank and ensemble weight
model_dict['model_id']=table.loc[model_id]['model_id']
model_dict['rank']=table.loc[model_id]['rank']
model_dict['ensemble_weight']=table.loc[model_id]['ensemble_weight']

# The steps in the models pipeline will be saved in the dictionary as follows:
# 'data_preprocessing': DataPreprocessor,
# 'balancing': Balancing,
# 'feature_preprocessor': FeaturePreprocessorChoice,
# 'classifier': ClassifierChoice -> autosklearn wrapped model
for step in model.steps:
model_dict[step[0]]=step[1]

#Adding sklearn model to the model dictionary
model_dict['sklearn_model']=model.steps[-1][1].choice.estimator
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is sort of duplicated from including the step above but I agree with your choice here, it lets a user specifically get the auto-sklearn wrapped version or the sklearn estimator which I suspect is of more relevance to people. No change needed, just commenting that I agree with it.

However I might still explicitly split out the tuple

model_type, autosklearn_wrapped_model = model.steps[-1]
model_dict['sklearn_model'] = autosklearn_wrapped_model.choice.estimator

If the automatic checks give you a hard time with model_type not being used, you can do the following

autosklearn_wrapped_model = model.steps[-1][1]


#Adding model to ensemble dictionary
ensemble_dict[model_id] = model_dict
return ensemble_dict


def _create_search_space(self, tmp_dir, backend, datamanager,
include=None,
Expand Down