-
Notifications
You must be signed in to change notification settings - Fork 876
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Out of fold stacking regressor #201
Out of fold stacking regressor #201
Conversation
Hello @EikeDehling! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on June 13, 2017 at 02:40 Hours UTC |
Hi, @EikeDehling , thanks a lot for the PR, this sounds awesome, and I am really looking forward to contributions like these! I haven't read through the code in detail, but based on the description on your website, it looks like it is an approach similar to the In the However, based on the figure, it looks like it adds the average of the 1st-level models to the final prediction? Or in other words, the I am just wondering about the differences and similarities between the Generally, I am all in favor of adding this implementation to mlxtend, thanks a lot for considering this contribution! |
Hi @rasbt, thanks for the quick response. The algorithm indeed looks like StackingCVClassifier, but adapted for regression. It looks like there are some subtle differences though. I'm fine with your naming suggestion, only remark is: There is no cross-validation going on in the algorithm as far as i can see? I don't have a strong opinion on this naming though. Happy to go with your choice. The algorithm divides training data into K folds. I then trains N instances of each base model type, each on K-1 parts of training data. Each instance makes predictions of the remaining piece of training data. Then the predictions for each model are concatenated and used as input for the second level model. This is all identical to the StackingCVClassifier. Now the difference starts, the N instances of the base model are kept, and used during predicting. There it does N predictions, one with each instance of the base model, and averages them as input for the second level model. Then the second level model predicts final output. I'm not a theoretical ML expert, so i'm not sure what approach works better. Perhaps the ideas from StackingCVClassifier would even be an improvement? I would be happy to run some experiments and then code the best performing version. Best regards, Eike |
Thanks for the thorough explanation, @EikeDehling !
I agree with you, it's not really a k-fold cross-validation going on here, but just a k-fold like sampling. I think we chose that name because we had the
I also don't have a strong preference, but maybe we should stick to
Is this based on some paper in literature, or is this a "new" algorithm you came up with during experimentation? If you'd run some experiments to compare both approaches, that would be great, but don't worry about it if it's too much work (however, that could also be an interesting study for a paper if this hasn't been done before :)).
Say we have n=10 samples in the dataset and k=5 1st-level regressors. Then, during prediction, the 1st level regressors first produce n*k=50 predictions, which are then averaged over k so that the 2nd level regressor gets n values instead of all 50 values? And the final prediction is the prediction of the 2nd level regressor on those n values? I think the only difference to the StackingCVClassifier would be the averaging part. I.e., if you would stack the predictions instead of averaging them prior to giving them to the 2nd-level classifier, the algorithms would be identical except classification vs regression. If that's indeed the case, I'd suggest to maybe toggle the averaging by a parameters. For example,
or sth like that. The default could then be the setting ( PS: There's another parameter in the
I am not sure if that helps with the performance of the StackingCVRegressor in practice, though. But it could be included as an additional option so that users can run their own experiments ... (I just see that "predictions of the original classifiers." should probably be changed to "predictions of the level-1 classifiers." for clarity. |
Hi ! The algorithm came from here, there is a good graphical explanation also: https://dnc1994.com/2016/05/rank-10-percent-in-first-kaggle-competition-en/#Stacking Renamed to StackingCVRegressor now. I will do some experiments on re-training the level-1 models vs using multiple and averaging, will let you know. Also will have a look at documenting this. Thanks! |
I've tried out what difference the approaches make:
Approach 2 is what the existing StackingCVClassifier does, approach 1 is what i saw documented elsewhere. My results: https://www.kaggle.com/eikedehling/trying-out-stacking-approaches/code Summary: It doesn't make any significant difference for results. Sometimes one version is slightly better, sometimes the other version does a bit better. |
Thanks for testing these out! Just wanted to mention that in 2), the model 1 classifiers (in the StackingCVClassifier) are also fit on the whole training set in the end as per If it's not too complicated, it would be nice to have a parameter to toggle between the two different approaches like mentioned above : Either way, as long it's documented what it does, it would be fine :) |
# is trained and makes predictions, after which we train the | ||
# meta-regressor on their combined results. | ||
# | ||
for i, clf in enumerate(self.regressors): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not that important since it's an internal variable, but I would suggest changing clf
to regr
or so
I just re-examined the code and made a high-level summary for myself: StackingCVClassifier
if we have cv=3, that is 3 test folds, and each
Current StackingCVRegressor
if the test fold has 50 samples, and we have 3 test folds,
For the test set prediction, we use all n_level1_regressors x n_folds regressors, Diff
I think your implementation looks fine! I probably wouldn't add more complexity if the approach works well in practice. We just need to document the behavior properly :). |
Maybe you could prepare a Jupyter Notebook similar to the one for the |
Hi @rasbt, Thanks for the feedback and review! I've adjusted the StackingCVRegressor to match StackingCVClassifiers algorithm (training level-1 regressors on full dataset ; get rid of the N regressors and averaging). The results were just as good, so maybe stick with that approach.. Also implemented the use_features_in_secondary option. I've made a start with a notebook in the docs, will let you know when i'm done there. Best, Eike |
I think the jupyter notebook now covers the important things. The other notebooks also include API docs, maybe you can help me to get that working. I assume that's automatically generated? |
I think that option is enabled now - can you edit? Thanks! |
Alright, I just set up the documentation so that it can be uploaded to the web documentation when I make the next mlxtend version release. I also made some small modifications to the Jupyter notebook. If that looks okay to you, I think this PR is ready to be merged :) |
Hi @rasbt, cool, looks great to me! Thanks for that :-) Eike |
Thanks for all the work and great contribution, really appreciate it! |
Cool, great! |
Description
I've implemented a new ensemble regressor, for out-of-fold stacking. It's a different approach for training base regressors, that better avoids overfitting. For description of algorithm, see:
https://dnc1994.com/2016/05/rank-10-percent-in-first-kaggle-competition-en/#Stacking
I've only implemented the algorithm and some basic tests, but not written documentation yet - right now i'd like to know if you are interested in including this algorithm in the mlxtend code base before i spend more time.
If you're interested to include this, i'm happy to iterate on review / code!
Thanks for taking a look at this!
Pull Request requirements
./mlxtend/*/tests
directoriesnosetests ./mlxtend -sv
and make sure that all unit tests passnosetests ./mlxtend --with-coverage
flake8 ./mlxtend
./docs/sources/CHANGELOG.md
filemlxtend/docs/sources/
(optional)