Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[Bug Report] Unstable reaward in bipedalwalker-v3 #2390

Closed
alexxchen opened this issue Sep 3, 2021 · 3 comments
Closed

[Bug Report] Unstable reaward in bipedalwalker-v3 #2390

alexxchen opened this issue Sep 3, 2021 · 3 comments

Comments

@alexxchen
Copy link

alexxchen commented Sep 3, 2021

Describe the bug
Different reward is generated with empty action after setting env.seed and np.random.seed. I tried many version including 0.20.0, they all have the problem.

Code example

import gym
import numpy as np
env = gym.make('BipedalWalker-v3')
print('observation space:',env.observation_space)
print('action space:',env.action_space)
def play_one_episode(env, render=False):  
    states_list, rewards_list, actions_list = [], [], []
    env.seed(10)
    np.random.seed(10)
    state = env.reset()

    ep_reward = 0
    # print(state)
    for t in range(1, 20):
        states_list.append(state)
        action = np.zeros((4))
        
        state, reward, done, _ = env.step(action)
            
        if render:
            env.render()
        rewards_list.append(reward)
        actions_list.append(action)

        ep_reward += reward
        if done:
            break
    return states_list, rewards_list, actions_list, ep_reward

for i in range(100):
    states_list, rewards_list, actions_list, ep_reward = play_one_episode(env)
    print(rewards_list[-1])

Output from terminal,the reward is unstable:
action space: Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
-0.0031776696365959367
-0.00081495544873178
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
...

The varience become bigger with more iterations.

@jkterry1
Copy link
Collaborator

FYI- this issue isn't being ignored, it's just that none of the maintainers have had the bandwidth to review it recently. Expect this to be addressed in the next 2-6 weeks.

@RedTachyon
Copy link
Contributor

I looked into it a bit, it seems the inconsistency happens in the line of
self.world.Step(1.0 / FPS, 6 * 30, 2 * 30) (https://github.com/openai/gym/blob/master/gym/envs/box2d/bipedal_walker.py#L429)

I tried explicitly setting a seed in Box2D, but that didn't help, so my best bet is that this is an underlying issue with Box2D, probably some numerical errors... Not sure if there's anything to be done on the gym end if the error comes from the simulator. There are plans to replace Box2D with something else (#2358), but that might take a while

@jkterry1
Copy link
Collaborator

Thanks a ton Ariel, I'm going to close this issue in favor of that. I've used pymunk/chipmunk (the planned replacement) incredibly extensively in RL and haven't found these inconsistencies before.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants