[Question] How to zeroing out the reward sum ret_ = rewards[r] + gamma * ret_ #2089
Open
4 tasks done
Labels
question
Further information is requested
❓ How to zeroing out the reward sum ret_ = rewards[r] + gamma * ret_
Hello, i made custom environment like Atari 2600 Pong and my custom A2C has this code
The lines below are responsible for zeroing out the reward sum ret_ = rewards[r] + gamma * ret_, like here line 44, when it reaches the boundary between game episodes, for example, the first player wins the episode, and there can be several episodes until someone scores 21 points.
i found this code here
rollout_buffer.compute_returns_and_advantage(last_values=values, dones=dones)
and here i think it's last_gae_lam
last_gae_lam = delta + self.gamma * self.gae_lambda * next_non_terminal * last_gae_lam
And how to do zeroing out the reward sum in stable baselines 3 in A2C?
Checklist
The text was updated successfully, but these errors were encountered: