Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug]: is_image_space works poorly with Gymnasium's FrameStackObservation #2090

Open
5 tasks done
fracapuano opened this issue Feb 25, 2025 · 4 comments
Open
5 tasks done
Labels
bug Something isn't working documentation Improvements or additions to documentation help wanted Help from contributors is welcomed

Comments

@fracapuano
Copy link
Contributor

🐛 Bug

MRC 👇

from stable_baselines3.common.preprocessing import is_image_space
import gymnasium as gym
import numpy as np

from gymnasium.wrappers import FrameStackObservation

image_space = gym.spaces.Box(0, 255, (3, 64, 64), np.uint8)  # a basic RGB 64x64 image

class DummyEnv(gym.Env):
    def __init__(self):
        super().__init__()
        self.observation_space = image_space

env = DummyEnv()
print(is_image_space(env.observation_space))  # True

env = FrameStackObservation(env, stack_size=2)

print(env.observation_space.shape)  # (2, 3, 64, 64), stacked obs on dim=0
print(is_image_space(env.observation_space))  # False

From here it seems to be due to the fact that sb3 expects images to be tensors of dimension strictly equal to 3.

I am wondering why not setting the check to be >=3 instead of strictly equal. In this way, one would still pass the image check (though I reckon might have problems with NatureCNN as I am not sure how it would handle 4-dimensional tensors).

Worth adding that:

  • A workaround this bug is to transform the observation using the TransformObservation env wrapper from gymnasium, merging the two first dimension into one
  • This is precisely what sb3 does in their VecFrameStack!

Worth saying this can have very unintended consequence: if one passes an image through Gymnasium's frame stacking and then uses sb3, the image won't be recognized as such, as the feature extractor for that image will be set to be Flatten (instead of NatureCNN).

Happy to open a PR to change this check, but I wanted to double check it made sense first!

To Reproduce

see above

Relevant log output / Error message

see above

System Info

  • OS: macOS-15.0.1-arm64-arm-64bit Darwin Kernel Version 24.0.0: Tue Sep 24 23:36:26 PDT 2024; root:xnu-11215.1.12~1/RELEASE_ARM64_T8103
  • Python: 3.12.9
  • Stable-Baselines3: 2.5.0
  • PyTorch: 2.6.0
  • GPU Enabled: False
  • Numpy: 2.2.3
  • Cloudpickle: 3.1.1
  • Gymnasium: 1.0.0

Checklist

  • My issue does not relate to a custom gym environment. (Use the custom gym env template instead)
  • I have checked that there is no similar issue in the repo
  • I have read the documentation
  • I have provided a minimal and working example to reproduce the bug
  • I've used the markdown code blocks for both code and stack traces.
@fracapuano fracapuano added the bug Something isn't working label Feb 25, 2025
@araffin
Copy link
Member

araffin commented Feb 25, 2025

Hello,
probably related to #1500.

This is precisely what sb3 does in their VecFrameStack!

In general you should use VecFrameStack() instead yes.

@fracapuano
Copy link
Contributor Author

Thank you! Are you satisfied with this at all? I think the check could be changed, or you reckon that would require too much work?

@araffin araffin added the documentation Improvements or additions to documentation label Mar 1, 2025
@araffin
Copy link
Member

araffin commented Mar 1, 2025

I am wondering why not setting the check to be >=3 instead of strictly equal.

The check is conservative, so that it returns False if there is a doubt.

To me (2, 3, 64, 64) looks like a batch of images and not an image, also it makes the detection of channel-first/last much harder (so the forward pass is likely to fail after).

So I would keep the check as-is but probably add a note in the documentation linking to this issue and VecFrameStack, on https://stable-baselines3.readthedocs.io/en/master/guide/custom_env.html#using-custom-environments

@araffin araffin added the help wanted Help from contributors is welcomed label Mar 1, 2025
@fracapuano
Copy link
Contributor Author

fracapuano commented Mar 2, 2025

Thank you for the extra context---will sort this out opening a PR with this soon then ;)

it makes the detection of channel-first/last much harder
Regarding this point: I agree, and I actually think it'd be best to stack the use a flag (within FrameStackObservation in gymnasium) to "collapse" the stack into one dimension only (thus aligning it with VecFrameStack).

Edit: I actually think it'd be suboptimal, as they already define a TransformObservation wrapper which can be used to achieve exactly the same objective. I will reference this issue in the documentation, warning about unintended side effects (i.e., image observations being flattened!) and providing a small snippet to use TransformObservation in case one cannot use VecFrameStack (that was my case!)

Thanks for discussing! 🙏

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working documentation Improvements or additions to documentation help wanted Help from contributors is welcomed
Projects
None yet
Development

No branches or pull requests

2 participants