Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implementation of the .632 boostrap #283

Merged
merged 5 commits into from
Nov 12, 2017
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions docs/mkdocs.yml
Original file line number Diff line number Diff line change
Expand Up @@ -53,6 +53,7 @@ pages:
- user_guide/data/wine_data.md
- evaluate:
- user_guide/evaluate/bootstrap.md
- user_guide/evaluate/bootstrap_point632_score.md
- user_guide/evaluate/BootstrapOutOfBag.md
- user_guide/evaluate/confusion_matrix.md
- user_guide/evaluate/lift_score.md
Expand Down
5 changes: 5 additions & 0 deletions docs/sources/CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,7 @@ The CHANGELOG for the current development version is available at

##### New Features

- Added `mlxtend.evaluate.bootstrap_point632_score` to evaluate the performance of estimators using the .632 bootstrap [#283](https://github.com/rasbt/mlxtend/pull/283)
- New `max_len` parameter for the frequent itemset generation via the `apriori` function to allow for early stopping. ([#270](https://github.com/rasbt/mlxtend/pull/270))

##### Changes
Expand All @@ -24,6 +25,10 @@ Note that this didn't cause any difference in performance on any of the test sce
[#262](https://github.com/rasbt/mlxtend/pull/262)
- utils.Counter now accepts a name variable to help distinguish between multiple counters, time precision can be set with the 'precision' kwarg and the new attribute end_time holds the time the last iteration completed. [#278](https://github.com/rasbt/mlxtend/pull/278)

##### Bug Fixes

- Fixed an issue that occured with McNemar test when using SciPy 1.0. [#283](https://github.com/rasbt/mlxtend/pull/283)


### Version 0.9.0 (2017-10-21)

Expand Down
1 change: 1 addition & 0 deletions docs/sources/USER_GUIDE_INDEX.md
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@

## `evaluate`
- [bootstrap](user_guide/evaluate/bootstrap.md)
- [bootstrap_point632_score](user_guide/evaluate/bootstrap_point632_score.md)
- [BootstrapOutOfBag](user_guide/evaluate/BootstrapOutOfBag.md)
- [confusion_matrix](user_guide/evaluate/confusion_matrix.md)
- [lift_score](user_guide/evaluate/lift_score.md)
Expand Down
253 changes: 253 additions & 0 deletions docs/sources/user_guide/evaluate/bootstrap_point632_score.ipynb
Original file line number Diff line number Diff line change
@@ -0,0 +1,253 @@
{
"cells": [
{
"cell_type": "markdown",
"metadata": {},
"source": [
"# bootstrap_point632_score"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"An implementation of the .632 bootstrap to evaluate supervised learning algorithms."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"> `from mlxtend.evaluate import bootstrap_point632_score` "
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Overview"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"Originally, the bootstrap method aims to determine the statistical properties of an estimator when the underlying distribution was unknown and additional samples are not available. Now, in order to exploit this method for the evaluation of predictive models, such as hypotheses for classification and regression, we may prefer a slightly different approach to bootstrapping using the so-called Out-Of-Bag (OOB) or Leave-One-Out Bootstrap (LOOB) technique. Here, we use out-of-bag samples as test sets for evaluation instead of evaluating the model on the training data. Out-of-bag samples are the unique sets of instances that are not used for model fitting as shown in the figure below [1].\n",
"\n",
"![](BootstrapOutOfBag_files/bootstrap_concept.png)\n",
"\n",
"\n",
"The figure above illustrates how three random bootstrap samples drawn from an exemplary ten-sample dataset ($X_1,X_2, ..., X_{10}$) and their out-of-bag sample for testing may look like. In practice, Bradley Efron and Robert Tibshirani recommend drawing 50 to 200 bootstrap samples as being sufficient for reliable estimates [2].\n",
"\n",
"\n",
"\n",
"In 1983, Bradley Efron described the *.632 Estimate*, a further improvement to address the pessimistic bias of the bootstrap cross-validation approach described above (Efron, 1983). The pessimistic bias in the \"classic\" bootstrap method can be attributed to the fact that the bootstrap samples only contain approximately 63.2% of the unique samples from the original dataset. For instance, we can compute the probability that a given sample from a dataset of size *n* is *not* drawn as a bootstrap sample as\n",
"\n",
"$$P (\\text{not chosen}) = \\bigg(1 - \\frac{1}{n}\\bigg)^n,$$\n",
"\n",
"which is asymptotically equivalent to $\\frac{1}{e} \\approx 0.368$ as $n \\rightarrow \\infty.$\n",
"\n",
"Vice versa, we can then compute the probability that a sample *is* chosen as $P (\\text{chosen}) = 1 - \\bigg(1 - \\frac{1}{n}\\bigg)^n \\approx 0.632$ for reasonably large datasets, so that we'd select approximately $0.632 \\times n$ uniques samples as bootstrap training sets and reserve $ 0.368 \\times n $ out-of-bag samples for testing in each iteration.\n",
"\n",
"\n",
"Now, to address the bias that is due to this the sampling with replacement, Bradley Efron proposed the *.632 Estimate* that we mentioned earlier, which is computed via the following equation:\n",
"\n",
"$$\\text{ACC}_{boot} = \\frac{1}{b} \\sum_{i=1}^b \\big(0.632 \\cdot \\text{ACC}_{h, i} + 0.368 \\cdot \\text{ACC}_{r, i}\\big), $$\n",
"\n",
"where $\\text{ACC}_{r, i}$ is the resubstitution accuracy, and $\\text{ACC}_{h, i}$ is the accuracy on the out-of-bag sample."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"### References\n",
"\n",
"- [1] https://sebastianraschka.com/blog/2016/model-evaluation-selection-part2.html\n",
"- [2] Efron, Bradley, and Robert J. Tibshirani. An introduction to the bootstrap. CRC press, 1994. Management of Data (ACM SIGMOD '97), pages 265-276, 1997."
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## Example 1 -- Evaluating the predictive performance of a model"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"The `bootstrap_point632_score` function mimics the behavior of scikit-learn's `cross_val_score, and a typically usage example is shown below:"
]
},
{
"cell_type": "code",
"execution_count": 1,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"Accuracy: 94.99%\n",
"95% Confidence interval: [90.76, 98.28]\n"
]
}
],
"source": [
"from sklearn import datasets, linear_model\n",
"from mlxtend.evaluate import bootstrap_point632_score\n",
"import numpy as np\n",
"\n",
"iris = datasets.load_iris()\n",
"X = iris.data\n",
"y = iris.target\n",
"lr = linear_model.LogisticRegression()\n",
"\n",
"# Model accuracy\n",
"scores = bootstrap_point632_score(lr, X, y)\n",
"acc = np.mean(scores)\n",
"print('Accuracy: %.2f%%' % (100*acc))\n",
"\n",
"\n",
"# Confidence interval\n",
"lower = np.percentile(scores, 2.5)\n",
"upper = np.percentile(scores, 97.5)\n",
"print('95%% Confidence interval: [%.2f, %.2f]' % (100*lower, 100*upper))"
]
},
{
"cell_type": "markdown",
"metadata": {},
"source": [
"## API"
]
},
{
"cell_type": "code",
"execution_count": 2,
"metadata": {},
"outputs": [
{
"name": "stdout",
"output_type": "stream",
"text": [
"## bootstrap_point632_score\n",
"\n",
"*bootstrap_point632_score(estimator, X, y, n_splits=200, method='.632', scoring=None, random_seed=None)*\n",
"\n",
"Implementation of the 0.632 bootstrap for supervised learning\n",
"\n",
"**Parameters**\n",
"\n",
"- `estimator` : object\n",
"\n",
" An estimator for classification or regression that\n",
" follows the scikit-learn API and implements \"fit\" and \"predict\"\n",
" methods.\n",
"\n",
"\n",
"- `X` : array-like\n",
"\n",
" The data to fit. Can be, for example a list, or an array at least 2d.\n",
"\n",
"\n",
"- `y` : array-like, optional, default: None\n",
"\n",
" The target variable to try to predict in the case of\n",
" supervised learning.\n",
"\n",
"\n",
"- `n_splits` : int (default=200)\n",
"\n",
" Number of bootstrap iterations.\n",
" Must be larger than 1.\n",
"\n",
"\n",
"- `method` : str (default='.632')\n",
"\n",
" The bootstrap method, which can be either the\n",
" regular '.632' bootstrap (default) or the '.632+'\n",
" bootstrap (not implemented, yet).\n",
"\n",
"\n",
"- `scoring` : str, callable, or None (default: None)\n",
"\n",
" If None (default), uses 'accuracy' for sklearn classifiers\n",
" and 'r2' for sklearn regressors.\n",
" If str, uses a sklearn scoring metric string identifier, for example\n",
" {'accuracy', 'f1', 'precision', 'recall', 'roc_auc', etc.}\n",
" for classifiers,\n",
" {'mean_absolute_error', 'mean_squared_error'/'neg_mean_squared_error',\n",
" 'median_absolute_error', 'r2', etc.} for regressors.\n",
" If a callable object or function is provided, it has to be conform with\n",
" sklearn's signature ``scorer(estimator, X, y)``; see\n",
" http://scikit-learn.org/stable/modules/generated/sklearn.metrics.make_scorer.html\n",
" for more information.\n",
"\n",
"\n",
"- `random_seed` : int (default=None)\n",
"\n",
" If int, random_seed is the seed used by\n",
" the random number generator.\n",
"\n",
"**Returns**\n",
"\n",
"- `scores` : array of float, shape=(len(list(n_splits)),)\n",
"\n",
" Array of scores of the estimator for each bootstrap\n",
" replicate.\n",
"\n",
"**Examples**\n",
"\n",
" >>> from sklearn import datasets, linear_model\n",
" >>> from mlxtend.evaluate import bootstrap_point632_score\n",
" >>> import numpy as np\n",
" >>> iris = datasets.load_iris()\n",
" >>> X = iris.data\n",
" >>> y = iris.target\n",
" >>> lr = linear_model.LogisticRegression()\n",
" >>> scores = bootstrap_point632_score(lr, X, y)\n",
" >>> acc = np.mean(scores)\n",
" >>> print('Accuracy:', acc)\n",
" Accuracy: 0.953023146884\n",
" >>> lower = np.percentile(scores, 2.5)\n",
" >>> upper = np.percentile(scores, 97.5)\n",
" >>> print('95%% Confidence interval: [%.2f, %.2f]' % (lower, upper))\n",
" 95% Confidence interval: [0.90, 0.98]\n",
"\n",
"\n"
]
}
],
"source": [
"with open('../../api_modules/mlxtend.evaluate/bootstrap_point632_score.md', 'r') as f:\n",
" s = f.read() \n",
"print(s)"
]
}
],
"metadata": {
"anaconda-cloud": {},
"kernelspec": {
"display_name": "Python 3",
"language": "python",
"name": "python3"
},
"language_info": {
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"file_extension": ".py",
"mimetype": "text/x-python",
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.6.1"
}
},
"nbformat": 4,
"nbformat_minor": 1
}
4 changes: 3 additions & 1 deletion mlxtend/evaluate/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -11,9 +11,11 @@
from .mcnemar import mcnemar
from .bootstrap import bootstrap
from .bootstrap_outofbag import BootstrapOutOfBag
from .bootstrap_point632 import bootstrap_point632_score
from .permutation import permutation_test


__all__ = ["scoring", "confusion_matrix",
"mcnemar_table", "mcnemar", "lift_score",
"bootstrap", "permutation_test", "BootstrapOutOfBag"]
"bootstrap", "permutation_test",
"BootstrapOutOfBag", "bootstrap_point632_score"]
Loading