"ValueError: too many values to unpack" after a couple of evaluations using the EvalCallback #2094

Furkan-rgb · 2025-03-02T09:42:34Z

🐛 Bug

When using a custom PyTorch feature extractor (a SCINet model) with SB3 (PPO), the training eventually crashes at evaluation time with:

ValueError: too many values to unpack (expected 2)
The traceback shows that PyTorch errors in module.train(mode) → named_children(). This suggests that SB3’s policy (or my custom module) has an attribute that is not a standard (str -> nn.Module) mapping.

I suspect something is accidentally storing a tuple in self._modules, but I have not found any direct assignment. The “too many values to unpack” error occurs in stable-baselines3 code only when calling model.predict() during evaluation.

Could there be anything in SB3’s base policy or callbacks that might inadvertently store (moduleA, moduleB) in _modules? Or is there some recommended debugging step to see which submodule is a tuple?

Code example

import gymnasium as gym
import numpy as np
from gymnasium import spaces

import gymnasium as gym
import torch
from torch import nn
import numpy as np
from gymnasium import spaces

from stable_baselines3 import PPO
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
from stable_baselines3.common.env_checker import check_env

class MinimalDictEnv(gym.Env):
    """
    A toy Dict environment where observation has a timeseries-like box
    plus some scalars. (Just to mimic a possible timeseries scenario.)
    """
    def __init__(self):
        super().__init__()
        self.observation_space = spaces.Dict({
            "timeseries": spaces.Box(low=-1, high=1, shape=(8, 3), dtype=np.float32),
            "position": spaces.Box(low=0, high=1, shape=(1,), dtype=np.float32),
            "equity_ratio": spaces.Box(low=0, high=1, shape=(1,), dtype=np.float32),
            "drawdown_ratio": spaces.Box(low=0, high=1, shape=(1,), dtype=np.float32),
        })
        # Simple discrete action space
        self.action_space = spaces.Discrete(2)

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        obs = {
            "timeseries": self.observation_space["timeseries"].sample(),
            "position": self.observation_space["position"].sample(),
            "equity_ratio": self.observation_space["equity_ratio"].sample(),
            "drawdown_ratio": self.observation_space["drawdown_ratio"].sample(),
        }
        return obs, {}

    def step(self, action):
        obs = {
            "timeseries": self.observation_space["timeseries"].sample(),
            "position": self.observation_space["position"].sample(),
            "equity_ratio": self.observation_space["equity_ratio"].sample(),
            "drawdown_ratio": self.observation_space["drawdown_ratio"].sample(),
        }
        reward = 0.0
        terminated = False
        truncated = False
        info = {}
        return obs, reward, terminated, truncated, info

env = MinimalDictEnv()
check_env(env)

class DummySCINet(nn.Module):
    def __init__(self):
        super().__init__()
        # Suppose we have just a single linear for demonstration
        self.linear = nn.Linear(8, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x shape: [batch_size, window_size, num_features] => [B, 8, 3] in this example
        # Flatten over time dimension
        b, t, f = x.shape
        x = x.view(b, t*f)  # shape [B, 24]
        x = self.linear(x[:, :8])  # dummy usage
        # return shape [B, 1, something] or [B, something]
        return x.unsqueeze(1)  # shape [B, 1, 1]

class SCINetExtractor(BaseFeaturesExtractor):
    def __init__(self, observation_space: spaces.Dict):
        # Suppose final feature dim = 1 (from SCINet) + 3 (3 scalars) = 4
        super().__init__(observation_space, features_dim=4)
        self.scinet = DummySCINet()

    def forward(self, obs_dict):
        # timeseries: [B, 8, 3]
        timeseries = obs_dict["timeseries"]
        position = obs_dict["position"]           # [B, 1]
        equity = obs_dict["equity_ratio"]         # [B, 1]
        drawdown = obs_dict["drawdown_ratio"]     # [B, 1]

        scinet_out = self.scinet(timeseries)      # shape [B, 1, 1]
        scinet_out = scinet_out.squeeze(dim=1)    # [B, 1]

        # concat position/equity/drawdown => [B, 4]
        return torch.cat([scinet_out, position, equity, drawdown], dim=1)


    train_env = make_vec_env(
        TradingEnv,
        n_envs=24,
        env_kwargs=train_env_kwargs,
        monitor_dir="logs/",
    )

    val_env = make_vec_env(
        TradingEnv,
        n_envs=8,
        env_kwargs=val_env_kwargs,
        monitor_dir="logs/",
    )

    policy_kwargs = dict(
        features_extractor_class=SCINetExtractor,
        features_extractor_kwargs=dict(hid_size=1, num_levels=4, kernel_size=5, dropout=0.5),
        net_arch=[64, 64],
        optimizer_class=torch.optim.AdamW,
        optimizer_kwargs={"weight_decay": 1e-5},
    )

    model = PPO(
        policy="MultiInputPolicy",
        learning_rate=5e-5,
        n_steps=2048,
        n_epochs=10,
        batch_size=128,
        env=train_env,
        policy_kwargs=policy_kwargs,
        verbose=1,
        device="cuda",
    )

    eval_callback = EvalCallback(
        eval_env=val_env,
        n_eval_episodes=50,
        verbose=1,
        best_model_save_path="logs/",
        log_path="logs/",
    )
    model.learn(
        total_timesteps=5_000_000,
        progress_bar=False,
        callback=eval_callback,
    )

Relevant log output / Error message

Traceback (most recent call last):
  File "train_ppo.py", line 130, in <module>
    model.learn(
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/ppo/ppo.py", line 311, in learn
    return super().learn(
           ^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 323, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 224, in collect_rollouts
    if not callback.on_step():
           ^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/callbacks.py", line 114, in on_step
    return self._on_step()
           ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/callbacks.py", line 464, in _on_step
    episode_rewards, episode_lengths = evaluate_policy(
                                       ^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/evaluation.py", line 88, in evaluate_policy
    actions, states = model.predict(
                      ^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/base_class.py", line 557, in predict
    return self.policy.predict(observation, state, episode_start, deterministic)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/policies.py", line 352, in predict
    self.set_training_mode(False)
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/policies.py", line 211, in set_training_mode
    self.train(mode)
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2843, in train
    module.train(mode)
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2843, in train
    module.train(mode)
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2843, in train
    module.train(mode)
  [Previous line repeated 6 more times]
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2842, in train
    for module in self.children():
                  ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2725, in children
    for _name, module in self.named_children():
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2743, in named_children
    for name, module in self._modules.items():
        ^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

System Info

No response

Checklist

I have checked that there is no similar issue in the repo
I have read the documentation
I have provided a minimal and working example to reproduce the bug
I have checked my env using the env checker
I've used the markdown code blocks for both code and stack traces.

The text was updated successfully, but these errors were encountered:

Furkan-rgb added the custom gym env Issue related to Custom Gym Env label Mar 2, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

"ValueError: too many values to unpack" after a couple of evaluations using the EvalCallback #2094

"ValueError: too many values to unpack" after a couple of evaluations using the EvalCallback #2094

Furkan-rgb commented Mar 2, 2025

"ValueError: too many values to unpack" after a couple of evaluations using the EvalCallback #2094

"ValueError: too many values to unpack" after a couple of evaluations using the EvalCallback #2094

Comments

Furkan-rgb commented Mar 2, 2025

🐛 Bug

Code example

Relevant log output / Error message

System Info

Checklist