From bbe1f8603157e094865350dde00fcfcab4181754 Mon Sep 17 00:00:00 2001 From: Shahrukh Khan Date: Thu, 3 Oct 2024 13:43:56 +0200 Subject: [PATCH] Fix typos in big-bird.md --- big-bird.md | 6 +++--- 1 file changed, 3 insertions(+), 3 deletions(-) diff --git a/big-bird.md b/big-bird.md index e76a4bd939..11eb2955b5 100644 --- a/big-bird.md +++ b/big-bird.md @@ -179,11 +179,11 @@ Random attention is ensuring that each query token will attend a few random toke ```python # r1, r2, r are some random indices; Note: r1, r2, r3 are different for each row 👇 -Q[1] x [Q[r1], Q[r2], ......, Q[r]] +Q[1] x [K[r1], K[r2], ......, K[r]] . . . -Q[n-2] x [Q[r1], Q[r2], ......, Q[r]] +Q[n-2] x [K[r1], K[r2], ......, K[r]] # leaving 0th & (n-1)th token since they are already global ``` @@ -209,7 +209,7 @@ Attention score for \\(\mathbf{q}_{1}\\) represented by \\(a_1\\) where \\(a_1=S --- -For calculating attention score for tokens in seconcd block, we are gathering the first three blocks, the last block, and the fifth block. Then we can compute \\(a_2 = Softmax(q_2 * concat(k_1, k_2, k_3, k_5, k_7)\\). +For calculating attention score for tokens in second block, we are gathering the first three blocks, the last block, and the fifth block. Then we can compute \\(a_2 = Softmax(q_2 * concat(k_1, k_2, k_3, k_5, k_7)\\). ![BigBird block sparse attention](assets/18_big_bird/q2.png)