question for Rotary class #22

shao-shuai · 2025-03-06T22:25:54Z

Hi, studying the code right now and got a question for Rotary class.

After q = q.view(B, T, self.config.n_head, C // self.config.n_head).transpose(1, 2), the shape of q is [batch_size, n_heads, seq_length, head_dim].

When applying rotary positional encoding on q, we are extracting seq_len with seq_len = q.shape[1], but now q.shape[1] should be n_heads not seq_length right? Shouldn't we do seq_len = q.shape[2] instead? Let me know if my understanding is wrong.

The text was updated successfully, but these errors were encountered:

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

question for Rotary class #22

question for Rotary class #22

shao-shuai commented Mar 6, 2025

question for Rotary class #22

question for Rotary class #22

Comments

shao-shuai commented Mar 6, 2025