You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
The recommended gains for the weight init depend on the used activation function, see torch docs. However, as for now the used gains are statically implemented and always the same in ActorCriticPolicies. See here.
I recommend making the gains dependent on the activation function used(, i.e. probably mainly ReLU and tanh).
If you agree with this, I would like to implement it myself and PR.
Thanks and a good day!
To Reproduce
--
Relevant log output / Error message
--
System Info
--
Checklist
I have checked that there is no similar issue in the repo
The gains are from OpenAI Baselines, to keep results consistent, but compared to other initialization, I didn't see any investigation on the effect of the gain so far (this would be already a good contribution), or at least if using tanh/relu with constant gain has an effect.
Yes, I am talking about orthogonal init. I agree that it is useful to keep it consistent with OpenAI Baselines. A study regarding the effect of gain towards convergence will be useful.
It seems a coincidence (?) that the standard gain listed for ReLU for any initialization is also sqrt(2)Link. (The gain implemented in OpenAI Baselines and sb3 is also sqrt(2). Maybe they just used ReLU by default and never investigated the gain?)
One study that partly investigates impact of weight init is this. They find:
initializing the policy MLP with smaller weights in the last layer
network initialization scheme (C56) does not matter too much
Enhancement
The recommended gains for the weight init depend on the used activation function, see torch docs. However, as for now the used gains are statically implemented and always the same in ActorCriticPolicies. See here.
I recommend making the gains dependent on the activation function used(, i.e. probably mainly ReLU and tanh).
If you agree with this, I would like to implement it myself and PR.
Thanks and a good day!
To Reproduce
--
Relevant log output / Error message
System Info
--
Checklist
The text was updated successfully, but these errors were encountered: