You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
in architect.py, Im confused about the following 3 lines of code:
v_grads = torch.autograd.grad(loss, v_alphas + v_weights)
dalpha = v_grads[:len(v_alphas)]
dw = v_grads[len(v_alphas):]
why does the gradient compute w.r.t (v_alphas+v_weights)? and the dalpha is retrieved from v_grads[:len(v_alphas)]. I thought it should be computed w.r.t v_alphas only based on equation (7).
the other question is why can you get dalpha and dw from v_grads directly instead of doing autograd separately?
The text was updated successfully, but these errors were encountered:
in architect.py, Im confused about the following 3 lines of code:
v_grads = torch.autograd.grad(loss, v_alphas + v_weights)
dalpha = v_grads[:len(v_alphas)]
dw = v_grads[len(v_alphas):]
why does the gradient compute w.r.t (v_alphas+v_weights)? and the dalpha is retrieved from v_grads[:len(v_alphas)]. I thought it should be computed w.r.t v_alphas only based on equation (7).
the other question is why can you get dalpha and dw from v_grads directly instead of doing autograd separately?
The text was updated successfully, but these errors were encountered: