Replies: 1 comment
-
Questions about writing kernels are better asked on http://discord.com/invite/gpumode |
Beta Was this translation helpful? Give feedback.
0 replies
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
-
I want to calculate max(AB^T, axis=0) and max(AB^T, axis=1) in triton, where A is MxK and B is NxK matrix (M, N >> K).
I implemented a triton kernel to calculate only max(AB^T, axis=1) using tiling:
However, we need access A and B twice to calculate both max(AB^T, axis=0) and max(AB^T, axis=1), like below:
I think this is not I/O efficient.
Do you have any idea to improve this kernel?
Beta Was this translation helpful? Give feedback.
All reactions