Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

"ValueError: too many values to unpack" after a couple of evaluations using the EvalCallback #2094

Open
5 tasks done
Furkan-rgb opened this issue Mar 2, 2025 · 0 comments
Labels
custom gym env Issue related to Custom Gym Env

Comments

@Furkan-rgb
Copy link

🐛 Bug

When using a custom PyTorch feature extractor (a SCINet model) with SB3 (PPO), the training eventually crashes at evaluation time with:

ValueError: too many values to unpack (expected 2)
The traceback shows that PyTorch errors in module.train(mode) → named_children(). This suggests that SB3’s policy (or my custom module) has an attribute that is not a standard (str -> nn.Module) mapping.

I suspect something is accidentally storing a tuple in self._modules, but I have not found any direct assignment. The “too many values to unpack” error occurs in stable-baselines3 code only when calling model.predict() during evaluation.

Could there be anything in SB3’s base policy or callbacks that might inadvertently store (moduleA, moduleB) in _modules? Or is there some recommended debugging step to see which submodule is a tuple?

Code example

import gymnasium as gym
import numpy as np
from gymnasium import spaces

import gymnasium as gym
import torch
from torch import nn
import numpy as np
from gymnasium import spaces

from stable_baselines3 import PPO
from stable_baselines3.common.torch_layers import BaseFeaturesExtractor
from stable_baselines3.common.env_checker import check_env

class MinimalDictEnv(gym.Env):
    """
    A toy Dict environment where observation has a timeseries-like box
    plus some scalars. (Just to mimic a possible timeseries scenario.)
    """
    def __init__(self):
        super().__init__()
        self.observation_space = spaces.Dict({
            "timeseries": spaces.Box(low=-1, high=1, shape=(8, 3), dtype=np.float32),
            "position": spaces.Box(low=0, high=1, shape=(1,), dtype=np.float32),
            "equity_ratio": spaces.Box(low=0, high=1, shape=(1,), dtype=np.float32),
            "drawdown_ratio": spaces.Box(low=0, high=1, shape=(1,), dtype=np.float32),
        })
        # Simple discrete action space
        self.action_space = spaces.Discrete(2)

    def reset(self, seed=None, options=None):
        super().reset(seed=seed)
        obs = {
            "timeseries": self.observation_space["timeseries"].sample(),
            "position": self.observation_space["position"].sample(),
            "equity_ratio": self.observation_space["equity_ratio"].sample(),
            "drawdown_ratio": self.observation_space["drawdown_ratio"].sample(),
        }
        return obs, {}

    def step(self, action):
        obs = {
            "timeseries": self.observation_space["timeseries"].sample(),
            "position": self.observation_space["position"].sample(),
            "equity_ratio": self.observation_space["equity_ratio"].sample(),
            "drawdown_ratio": self.observation_space["drawdown_ratio"].sample(),
        }
        reward = 0.0
        terminated = False
        truncated = False
        info = {}
        return obs, reward, terminated, truncated, info

env = MinimalDictEnv()
check_env(env)

class DummySCINet(nn.Module):
    def __init__(self):
        super().__init__()
        # Suppose we have just a single linear for demonstration
        self.linear = nn.Linear(8, 1)

    def forward(self, x: torch.Tensor) -> torch.Tensor:
        # x shape: [batch_size, window_size, num_features] => [B, 8, 3] in this example
        # Flatten over time dimension
        b, t, f = x.shape
        x = x.view(b, t*f)  # shape [B, 24]
        x = self.linear(x[:, :8])  # dummy usage
        # return shape [B, 1, something] or [B, something]
        return x.unsqueeze(1)  # shape [B, 1, 1]

class SCINetExtractor(BaseFeaturesExtractor):
    def __init__(self, observation_space: spaces.Dict):
        # Suppose final feature dim = 1 (from SCINet) + 3 (3 scalars) = 4
        super().__init__(observation_space, features_dim=4)
        self.scinet = DummySCINet()

    def forward(self, obs_dict):
        # timeseries: [B, 8, 3]
        timeseries = obs_dict["timeseries"]
        position = obs_dict["position"]           # [B, 1]
        equity = obs_dict["equity_ratio"]         # [B, 1]
        drawdown = obs_dict["drawdown_ratio"]     # [B, 1]

        scinet_out = self.scinet(timeseries)      # shape [B, 1, 1]
        scinet_out = scinet_out.squeeze(dim=1)    # [B, 1]

        # concat position/equity/drawdown => [B, 4]
        return torch.cat([scinet_out, position, equity, drawdown], dim=1)


    train_env = make_vec_env(
        TradingEnv,
        n_envs=24,
        env_kwargs=train_env_kwargs,
        monitor_dir="logs/",
    )

    val_env = make_vec_env(
        TradingEnv,
        n_envs=8,
        env_kwargs=val_env_kwargs,
        monitor_dir="logs/",
    )

    policy_kwargs = dict(
        features_extractor_class=SCINetExtractor,
        features_extractor_kwargs=dict(hid_size=1, num_levels=4, kernel_size=5, dropout=0.5),
        net_arch=[64, 64],
        optimizer_class=torch.optim.AdamW,
        optimizer_kwargs={"weight_decay": 1e-5},
    )

    model = PPO(
        policy="MultiInputPolicy",
        learning_rate=5e-5,
        n_steps=2048,
        n_epochs=10,
        batch_size=128,
        env=train_env,
        policy_kwargs=policy_kwargs,
        verbose=1,
        device="cuda",
    )

    eval_callback = EvalCallback(
        eval_env=val_env,
        n_eval_episodes=50,
        verbose=1,
        best_model_save_path="logs/",
        log_path="logs/",
    )
    model.learn(
        total_timesteps=5_000_000,
        progress_bar=False,
        callback=eval_callback,
    )

Relevant log output / Error message

Traceback (most recent call last):
  File "train_ppo.py", line 130, in <module>
    model.learn(
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/ppo/ppo.py", line 311, in learn
    return super().learn(
           ^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 323, in learn
    continue_training = self.collect_rollouts(self.env, callback, self.rollout_buffer, n_rollout_steps=self.n_steps)
                        ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/on_policy_algorithm.py", line 224, in collect_rollouts
    if not callback.on_step():
           ^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/callbacks.py", line 114, in on_step
    return self._on_step()
           ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/callbacks.py", line 464, in _on_step
    episode_rewards, episode_lengths = evaluate_policy(
                                       ^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/evaluation.py", line 88, in evaluate_policy
    actions, states = model.predict(
                      ^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/base_class.py", line 557, in predict
    return self.policy.predict(observation, state, episode_start, deterministic)
           ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/policies.py", line 352, in predict
    self.set_training_mode(False)
  File "/.venv/lib/python3.12/site-packages/stable_baselines3/common/policies.py", line 211, in set_training_mode
    self.train(mode)
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2843, in train
    module.train(mode)
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2843, in train
    module.train(mode)
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2843, in train
    module.train(mode)
  [Previous line repeated 6 more times]
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2842, in train
    for module in self.children():
                  ^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2725, in children
    for _name, module in self.named_children():
                         ^^^^^^^^^^^^^^^^^^^^^
  File "/.venv/lib/python3.12/site-packages/torch/nn/modules/module.py", line 2743, in named_children
    for name, module in self._modules.items():
        ^^^^^^^^^^^^
ValueError: too many values to unpack (expected 2)

System Info

No response

Checklist

@Furkan-rgb Furkan-rgb added the custom gym env Issue related to Custom Gym Env label Mar 2, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
custom gym env Issue related to Custom Gym Env
Projects
None yet
Development

No branches or pull requests

1 participant