You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Hi, studying the code right now and got a question for Rotary class.
After q = q.view(B, T, self.config.n_head, C // self.config.n_head).transpose(1, 2), the shape of q is [batch_size, n_heads, seq_length, head_dim].
When applying rotary positional encoding on q, we are extracting seq_len with seq_len = q.shape[1], but now q.shape[1] should be n_heads not seq_length right? Shouldn't we do seq_len = q.shape[2] instead? Let me know if my understanding is wrong.
The text was updated successfully, but these errors were encountered:
Hi, studying the code right now and got a question for Rotary class.
After
q = q.view(B, T, self.config.n_head, C // self.config.n_head).transpose(1, 2)
, the shape of q is [batch_size, n_heads, seq_length, head_dim].When applying rotary positional encoding on q, we are extracting seq_len with
seq_len = q.shape[1]
, but now q.shape[1] should be n_heads not seq_length right? Shouldn't we doseq_len = q.shape[2]
instead? Let me know if my understanding is wrong.The text was updated successfully, but these errors were encountered: