All metalearner coefficients are zero, predictions will all be equal to 0 #155

DavidMarguerit · 2025-01-14T13:23:19Z

I am using SuperLearner to predict an outcome from a Random Forest algorithm. However, the Random Forest predicts only 0, and I don't understand how to fix the issue.

Here is a reproducible example:

#Training dataset
y <- c(-4.605170, 9.181019, -4.605170, -4.605170, 5.998099, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, 8.788880, -4.605170, 7.259213, -4.605170, -4.605170, -4.605170, -4.605170, 8.851838, 8.182144, -4.605170, -4.605170, -4.605170, 8.824345, -4.605170, -4.605170, 8.824345, -4.605170, -4.605170, -4.605170, 9.195547, 8.214720, 8.374350, 6.971533)

weightML <- c(14.95239, 18.55120, 18.55120, 19.70231, 14.95239, 14.95239, 18.55120, 14.95239, 18.55120, 18.55120, 18.55120, 14.95239, 18.55120, 15.73830, 18.55120, 18.55120, 19.70231, 15.73830, 14.95239, 15.73830, 14.95239, 14.95239, 15.73830, 18.55120, 18.55120, 14.95239, 14.95239, 14.95239, 14.95239, 15.73830, 14.95239, 14.95239, 14.95239, 14.95239, 18.55120, 19.70231, 14.95239)

train_x<-data.frame(matrix(,nrow=length(y),ncol=0))
train_x$x1 <- sample(100, size = nrow(df), replace = TRUE)
train_x$x2 <- sample(100, size = nrow(df), replace = TRUE)
train_x$x3 <- sample(100, size = nrow(df), replace = TRUE)
train_x$x4 <- sample(100, size = nrow(df), replace = TRUE)

#Test dataset
test_x<-data.frame(matrix(,nrow=length(y),ncol=0))
test_x$x1 <- sample(100, size = nrow(df), replace = TRUE)
test_x$x2 <- sample(100, size = nrow(df), replace = TRUE)
test_x$x3 <- sample(100, size = nrow(df), replace = TRUE)
test_x$x4 <- sample(100, size = nrow(df), replace = TRUE)

# RF
rf <- SuperLearner(Y = y, X = train_x, family = gaussian(), SL.library = "SL.ranger", obsWeights = weightML)
predict(rf, test_x, onlySL = TRUE)$pred

This code returns the following output:

> rf <- SuperLearner(Y = y, X = train_x, family = gaussian(), SL.library = "SL.ranger", obsWeights = weightML)
Warning messages:
1: All algorithms have zero weight 
2: All metalearner coefficients are zero, predictions will all be equal to 0 
> predict(rf, test_x, onlySL = TRUE)$pred
 [1] 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Warning message:
All metalearner coefficients are zero, predictions will all be equal to 0

Any idea why it predicts only 0 and how I can fix the issue?

I noticed that if I change y by by y+2 the prediction works. For instance, like this:

#Training dataset
y <- c(-4.605170, 9.181019, -4.605170, -4.605170, 5.998099, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, -4.605170, 8.788880, -4.605170, 7.259213, -4.605170, -4.605170, -4.605170, -4.605170, 8.851838, 8.182144, -4.605170, -4.605170, -4.605170, 8.824345, -4.605170, -4.605170, 8.824345, -4.605170, -4.605170, -4.605170, 9.195547, 8.214720, 8.374350, 6.971533)
y <- y+2

weightML <- c(14.95239, 18.55120, 18.55120, 19.70231, 14.95239, 14.95239, 18.55120, 14.95239, 18.55120, 18.55120, 18.55120, 14.95239, 18.55120, 15.73830, 18.55120, 18.55120, 19.70231, 15.73830, 14.95239, 15.73830, 14.95239, 14.95239, 15.73830, 18.55120, 18.55120, 14.95239, 14.95239, 14.95239, 14.95239, 15.73830, 14.95239, 14.95239, 14.95239, 14.95239, 18.55120, 19.70231, 14.95239)

train_x<-data.frame(matrix(,nrow=length(y),ncol=0))
train_x$x1 <- sample(100, size = nrow(df), replace = TRUE)
train_x$x2 <- sample(100, size = nrow(df), replace = TRUE)
train_x$x3 <- sample(100, size = nrow(df), replace = TRUE)
train_x$x4 <- sample(100, size = nrow(df), replace = TRUE)

#Test dataset
test_x<-data.frame(matrix(,nrow=length(y),ncol=0))
test_x$x1 <- sample(100, size = nrow(df), replace = TRUE)
test_x$x2 <- sample(100, size = nrow(df), replace = TRUE)
test_x$x3 <- sample(100, size = nrow(df), replace = TRUE)
test_x$x4 <- sample(100, size = nrow(df), replace = TRUE)

# RF
rf <- SuperLearner(Y = y, X = train_x, family = gaussian(), SL.library = "SL.ranger", obsWeights = weightML)
predict(rf, test_x, onlySL = TRUE)$pred

The text was updated successfully, but these errors were encountered:

ecpolley · 2025-01-14T16:39:37Z

To make it a reproducible example, should set the random seed (and need to define df)

library(SuperLearner)
set.seed(42)
df <- data.frame(y, weightML)

But this result isn't unexpected. There is no information in the X variables, and so predicting everyone to have Y=0 is better than using the ranger predictions which are not informative. Since the (weighted) mean of Y is close to 0 even adding SL.mean to the candidate library is unlikely to help much, but this is why you see shifting the mean value of Y gives some weight to the ranger predictions (but if you add SL.mean to the candidate library, it will get weight 1, again because the X variables are not informative here).

DavidMarguerit · 2025-01-14T20:36:03Z

Thank you for your answer. It helps me to understand what is happening better.

You are correct that in my example, x1, x2, x3, and x4 are uninformative since they are random. However, in my data, they are informative. Y measures hourly wages (in log), and x1, x2, x3, and x4 are confounders for age, working experience, household composition, and education, respectively. I am sure these confounders matter for wages. However, I get the same warning message and output as the one in my example.

ecpolley · 2025-01-15T20:08:06Z

What are you using for the library of candidate algorithms? You may want to try expanding the candidates and could look at the CV risk estimates relative to SL.mean to confirm your assumption about informative variables/algorithms.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

All metalearner coefficients are zero, predictions will all be equal to 0 #155

All metalearner coefficients are zero, predictions will all be equal to 0 #155

DavidMarguerit commented Jan 14, 2025

ecpolley commented Jan 14, 2025

DavidMarguerit commented Jan 14, 2025

ecpolley commented Jan 15, 2025

All metalearner coefficients are zero, predictions will all be equal to 0 #155

All metalearner coefficients are zero, predictions will all be equal to 0 #155

Comments

DavidMarguerit commented Jan 14, 2025

ecpolley commented Jan 14, 2025

DavidMarguerit commented Jan 14, 2025

ecpolley commented Jan 15, 2025