Using gridsearch to test different models #257

vincefav · 2017-09-29T23:34:12Z

Apologies if this has been discussed already, but I couldn't find any mention of it.

I see an example in the documentation of how to tune the parameters using a gridsearch, but would it be possible to also try different combinations of models? Here's the code I used, but it throws an error:


reg = StackingRegressor(regressors=[enet, gboost, rf])
params = {'meta_regressor': [lasso, xgb, linreg]}

And then I get an error saying I can't initialize the regressor without a meta model:

TypeError: __init__() missing 1 required positional argument: 'meta_model'

Similarly, it would also be nice to try different combinations of regressors.

Would it be possible to add this sort of functionality? Or is there a workaround I can use in the meantime?

The text was updated successfully, but these errors were encountered:

rasbt · 2017-09-30T01:49:50Z

Phew, that's a tricky one ;).

Looking at the code, I see that we have sth like

            for key, value in six.iteritems(super(StackingClassifier,
                                            self).get_params(deep=False)):
                if key in ('classifiers', 'meta-classifier'):
                    continue
                else:
                    out['%s' % key] = value

            return out

which is basically hiding those two from scikit-learn's grid search. The if clause could be removed to allow tuning the classifiers and meta-classifer during grid search as well. I just did some experiments, and e.g., the following would work after making that change (note that the n_neighbors arguments are provided also if [clf2, clf3], but GridSearchCV` just ignores those in case KNeighborsClassifier (clf1) is not present):

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.naive_bayes import GaussianNB 
from sklearn.ensemble import RandomForestClassifier
from sklearn.model_selection import GridSearchCV
from mlxtend.classifier import StackingClassifier

from sklearn import datasets

iris = datasets.load_iris()
X, y = iris.data[:, 1:3], iris.target

# Initializing models

clf1 = KNeighborsClassifier(n_neighbors=1)
clf2 = RandomForestClassifier(random_state=1)
clf3 = GaussianNB()
lr = LogisticRegression()
sclf = StackingClassifier(classifiers=[clf1, clf2, clf3], 
                          meta_classifier=lr)

params = {'classifiers': [[clf1, clf2, clf3], [clf2, clf3]], 'kneighborsclassifier__n_neighbors': [1, 5]}

grid = GridSearchCV(estimator=sclf, 
                    param_grid=params, 
                    cv=5,
                    refit=True)
grid.fit(X, y)

cv_keys = ('mean_test_score', 'std_test_score', 'params')

for r, _ in enumerate(grid.cv_results_['mean_test_score']):
    print("%0.3f +/- %0.2f %r"
          % (grid.cv_results_[cv_keys[0]][r],
             grid.cv_results_[cv_keys[1]][r] / 2.0,
             grid.cv_results_[cv_keys[2]][r]))

print('Best parameters: %s' % grid.best_params_)
print('Accuracy: %.2f' % grid.best_score_)

So, in short, yeah, changing

            for key, value in six.iteritems(super(StackingClassifier,
                                            self).get_params(deep=False)):
                if key in ('classifiers', 'meta-classifier'):
                    continue
                else:
                    out['%s' % key] = value

            return out

to

            for key, value in six.iteritems(super(StackingClassifier,
                                            self).get_params(deep=False)):
                out['%s' % key] = value

            return out

would allow that! Happy to make that change (the reason why I included the if-else was that I wasn't sure how it's handled by GridSearchCV, but it seems to be okay :))

vincefav · 2017-09-30T02:35:14Z

Awesome, thanks Sebastian!

I was having trouble with the StackedRegressor, but finally got it working! It looks like I still need to set some default settings when initializing it?

rasbt · 2017-10-01T02:41:11Z

yeah, I think defaults are required to get it working (maybe, in future we could have some sensible defaults for the regressors and meta-regressors though). Glad to hear that it's working though, and I will update the mlxtend implementations with regard to the modification I mentioned above so that you don't have to tweak the code manually ;)

rasbt · 2017-10-02T06:30:59Z

Addressed in now via #259. There's one little caveat though (and I added it to the docs): you cannot search over both classifiers/regressors and their parameters at the same time (it may be due to how GridSearchCV is currently implemented in sklearn; but I can try to see what happens if I set default arguments for the classifiers and regressors params).

Anyway, what I meant is that for instance, while the following parameter dictionary works in a sense that it does not produce an error:

params = {'randomforestclassifier__n_estimators': [1, 100],
'classifiers': [(clf1, clf1, clf1), (clf2, clf3)]}

but it will ignore 'randomforestclassifier__n_estimators' inside 'classifiers'

rasbt self-assigned this Sep 30, 2017

rasbt added Enhancement in progress labels Sep 30, 2017

rasbt mentioned this issue Oct 2, 2017

allow grid search for classifiers/regressors params in ensemble methods #259

Merged

rasbt closed this as completed in #259 Oct 2, 2017

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Using gridsearch to test different models #257

Using gridsearch to test different models #257

vincefav commented Sep 29, 2017

rasbt commented Sep 30, 2017

vincefav commented Sep 30, 2017 •

edited

Loading

rasbt commented Oct 1, 2017 •

edited

Loading

rasbt commented Oct 2, 2017

Using gridsearch to test different models #257

Using gridsearch to test different models #257

Comments

vincefav commented Sep 29, 2017

rasbt commented Sep 30, 2017

vincefav commented Sep 30, 2017 • edited Loading

rasbt commented Oct 1, 2017 • edited Loading

rasbt commented Oct 2, 2017

vincefav commented Sep 30, 2017 •

edited

Loading

rasbt commented Oct 1, 2017 •

edited

Loading