You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Describe the bug
Different reward is generated with empty action after setting env.seed and np.random.seed. I tried many version including 0.20.0, they all have the problem.
Code example
import gym
import numpy as np
env = gym.make('BipedalWalker-v3')
print('observation space:',env.observation_space)
print('action space:',env.action_space)
def play_one_episode(env, render=False):
states_list, rewards_list, actions_list = [], [], []
env.seed(10)
np.random.seed(10)
state = env.reset()
ep_reward = 0
# print(state)
for t in range(1, 20):
states_list.append(state)
action = np.zeros((4))
state, reward, done, _ = env.step(action)
if render:
env.render()
rewards_list.append(reward)
actions_list.append(action)
ep_reward += reward
if done:
break
return states_list, rewards_list, actions_list, ep_reward
for i in range(100):
states_list, rewards_list, actions_list, ep_reward = play_one_episode(env)
print(rewards_list[-1])
FYI- this issue isn't being ignored, it's just that none of the maintainers have had the bandwidth to review it recently. Expect this to be addressed in the next 2-6 weeks.
I tried explicitly setting a seed in Box2D, but that didn't help, so my best bet is that this is an underlying issue with Box2D, probably some numerical errors... Not sure if there's anything to be done on the gym end if the error comes from the simulator. There are plans to replace Box2D with something else (#2358), but that might take a while
Thanks a ton Ariel, I'm going to close this issue in favor of that. I've used pymunk/chipmunk (the planned replacement) incredibly extensively in RL and haven't found these inconsistencies before.
Describe the bug
Different reward is generated with empty action after setting env.seed and np.random.seed. I tried many version including 0.20.0, they all have the problem.
Code example
Output from terminal,the reward is unstable:
action space: Box([-1. -1. -1. -1.], [1. 1. 1. 1.], (4,), float32)
-0.0031776696365959367
-0.00081495544873178
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
-0.0031776696365959367
...
The varience become bigger with more iterations.
The text was updated successfully, but these errors were encountered: