Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

bias_variance_decomp bug: numpy.zeros truncating predictions #743

Closed
johnnybarrels opened this issue Oct 18, 2020 · 1 comment · Fixed by #749
Closed

bias_variance_decomp bug: numpy.zeros truncating predictions #743

johnnybarrels opened this issue Oct 18, 2020 · 1 comment · Fixed by #749
Labels

Comments

@johnnybarrels
Copy link

Bug description

The predictions matrix all_pred initialised by np.zeros(..., dtype=np.int) in line 73 of bias_variance_decomp() is truncating predictions (casting to integer):

all_pred = np.zeros((num_rounds, y_test.shape[0]), dtype=np.int)

Example of numpy behaviour causing the issue:

import numpy as np
np.__version__  # 1.19.2 (current latest)

all_pred = np.zeros((2,3), dtype=np.int)
all_pred[0] = [0.25, 0.5, 0.75]
all_pred[1] = [1.3, 1.6, 1.9]
print(all_pred)
array([[0, 0, 0],
       [1, 1, 1]])

This causes wildly inaccurate results if the target variable is small, as predictions are truncated as integers. Regardless, casting predictions to integers doesn't strike me as a desired feature of the bias_variance_decomp() function.

See this gist for a full reproducible example of this, but below are the differences in results in a regression case with a small target variable:

Unchanged function results:

print(avg_expected_loss)
print(avg_bias)
print(avg_var)
0.2826888888888888
0.2698977777777778
0.012791111111111112

Results after removing dtype=np.int from np.zeros() in all_pred initialisation:

print(avg_expected_loss)
print(avg_bias)
print(avg_var)
0.039183805200284395
0.03825420409046315
0.0009296011098212146

Steps/Code to Reproduce

See this gist.

Versions

MLxtend 0.17.3
macOS-10.15.6-x86_64-i386-64bit
Python 3.8.3 (v3.8.3:6f8c8320e9, May 13 2020, 16:29:34)
[Clang 6.0 (clang-600.0.57)]
Scikit-learn 0.23.2
NumPy 1.19.2
SciPy 1.5.2

@rasbt
Copy link
Owner

rasbt commented Nov 10, 2020

Wow, good catch. Yeah, the examples and unit tests for the MSE loss were all with relatively large numbers so I didn't notice that. That's going to be fixed via #749. Many thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants