[Bug Report] Unstable reaward in bipedalwalker-v3 #2390

alexxchen · 2021-09-03T10:07:33Z

Describe the bug
Different reward is generated with empty action after setting env.seed and np.random.seed. I tried many version including 0.20.0, they all have the problem.

Code example

import gym
import numpy as np
env = gym.make('BipedalWalker-v3')
print('observation space:',env.observation_space)
print('action space:',env.action_space)
def play_one_episode(env, render=False):  
    states_list, rewards_list, actions_list = [], [], []
    env.seed(10)
    np.random.seed(10)
    state = env.reset()

    ep_reward = 0
    # print(state)
    for t in range(1, 20):
        states_list.append(state)
        action = np.zeros((4))
        
        state, reward, done, _ = env.step(action)
            
        if render:
            env.render()
        rewards_list.append(reward)
        actions_list.append(action)

        ep_reward += reward
        if done:
            break
    return states_list, rewards_list, actions_list, ep_reward

for i in range(100):
    states_list, rewards_list, actions_list, ep_reward = play_one_episode(env)
    print(rewards_list[-1])

Output from terminal，the reward is unstable:
action space: Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
-0.0031776696365959367
-0.00081495544873178
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
...

The varience become bigger with more iterations.

The text was updated successfully, but these errors were encountered:

jkterry1 · 2021-09-13T19:16:01Z

FYI- this issue isn't being ignored, it's just that none of the maintainers have had the bandwidth to review it recently. Expect this to be addressed in the next 2-6 weeks.

RedTachyon · 2021-09-27T11:04:00Z

I looked into it a bit, it seems the inconsistency happens in the line of
self.world.Step(1.0 / FPS, 6 * 30, 2 * 30) (https://github.com/openai/gym/blob/master/gym/envs/box2d/bipedal_walker.py#L429)

I tried explicitly setting a seed in Box2D, but that didn't help, so my best bet is that this is an underlying issue with Box2D, probably some numerical errors... Not sure if there's anything to be done on the gym end if the error comes from the simulator. There are plans to replace Box2D with something else (#2358), but that might take a while

jkterry1 · 2021-09-27T19:33:11Z

Thanks a ton Ariel, I'm going to close this issue in favor of that. I've used pymunk/chipmunk (the planned replacement) incredibly extensively in RL and haven't found these inconsistencies before.

jkterry1 added the help wanted label Sep 5, 2021

jkterry1 closed this as completed Sep 27, 2021

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[Bug Report] Unstable reaward in bipedalwalker-v3 #2390

[Bug Report] Unstable reaward in bipedalwalker-v3 #2390

alexxchen commented Sep 3, 2021 •

edited

Loading

jkterry1 commented Sep 13, 2021

RedTachyon commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

[Bug Report] Unstable reaward in bipedalwalker-v3 #2390

[Bug Report] Unstable reaward in bipedalwalker-v3 #2390

Comments

alexxchen commented Sep 3, 2021 • edited Loading

jkterry1 commented Sep 13, 2021

RedTachyon commented Sep 27, 2021

jkterry1 commented Sep 27, 2021

alexxchen commented Sep 3, 2021 •

edited

Loading