update custom PA kernel with support for fp8 kv cache dtype #87

sanyalington · 2024-07-16T17:19:07Z

custom PA kernel support added for fp8 kv cache dtype.
Change custom PA partition size to 512 to prefer throughput scenarios at cost of latency.
Unit Tested for f16 and fp8 kv cache, num_qheads=64 num_kvheads=8 only.
e2e tested on mlperf offline benchmark.
Unit tests to be updated in separate PR.

…ustom PA partition size to 512 to prefer throughput scenarios at cost of latency

HaiShaw · 2024-07-18T19:52:55Z

csrc/custom/paged_attention/attention_ll4mi.cu

@@ -20,6 +22,20 @@ typedef float16x4 _Half4;
 typedef struct _Half8 {
  _Half4 xy[2];
 } _Half8;
+


below types defined to unify float16 and bfloat16?

HaiShaw · 2024-07-18T20:23:31Z

vllm/attention/ops/paged_attn.py

@@ -15,7 +15,7 @@

 # Should be the same as PARTITION_SIZE in `paged_attention_v2_launcher`.
 _PARTITION_SIZE_V1V2 = 512
-_PARTITION_SIZE_CUSTOM = 256
+_PARTITION_SIZE_CUSTOM = 512


Do we need to keep this different def now?

shajrawi

Ready to ship after some changes thanks to @mawong-amd

mawong-amd · 2024-09-12T00:35:50Z

Have updated this on top of latest main and also enabled FP8 KV cache on BF16.

HaiShaw · 2024-09-12T01:04:48Z

@mawong-amd I'll take a quick look.

* update custom PA kernel with support for fp8 kv cache dtype; change custom PA partition size to 512 to prefer throughput scenarios at cost of latency * Fix lint * Fix BF16 with FP8 KV cache (scaled conversion incorrectly done in fp16) --------- Co-authored-by: Matthew Wong <[email protected]>

update custom PA kernel with support for fp8 kv cache dtype; change c…

9a01096

…ustom PA partition size to 512 to prefer throughput scenarios at cost of latency

HaiShaw self-requested a review July 18, 2024 22:36

HaiShaw reviewed Jul 18, 2024

View reviewed changes

mawong-amd added 3 commits September 11, 2024 16:42

Merge branch 'main' of github.com:ROCm/vllm into shsanyal_pa_kernels

0ff0825

Fix lint

4faea79

Fix BF16 with FP8 KV cache (scaled conversion incorrectly done in fp16)

4896e69

shajrawi approved these changes Sep 12, 2024

View reviewed changes

mawong-amd merged commit 78e6e0f into main Sep 12, 2024
16 checks passed

gshtras deleted the shsanyal_pa_kernels branch September 18, 2024 15:18

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

update custom PA kernel with support for fp8 kv cache dtype #87

update custom PA kernel with support for fp8 kv cache dtype #87

sanyalington commented Jul 16, 2024

HaiShaw Jul 18, 2024

HaiShaw Jul 18, 2024

shajrawi left a comment

mawong-amd commented Sep 12, 2024

HaiShaw commented Sep 12, 2024

update custom PA kernel with support for fp8 kv cache dtype #87

update custom PA kernel with support for fp8 kv cache dtype #87

Conversation

sanyalington commented Jul 16, 2024

HaiShaw Jul 18, 2024

Choose a reason for hiding this comment

HaiShaw Jul 18, 2024

Choose a reason for hiding this comment

shajrawi left a comment

Choose a reason for hiding this comment

mawong-amd commented Sep 12, 2024

HaiShaw commented Sep 12, 2024