Why using tl.multiple_of(a_ptr, [16, 16]) makes it very fast? #6202
Unanswered
foreverpiano
asked this question in
Q&A
Replies: 0 comments
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
code copy from grouped GEMM
no tl.multiple_of(a_ptr, [16, 16]) 0.100s
with tl.multiple_of(a_ptr, [16, 16]) 0.030s
3.3x faster
Beta Was this translation helpful? Give feedback.
All reactions