Skip to content

Commit

Permalink
Update README.md
Browse files Browse the repository at this point in the history
  • Loading branch information
hwu36 authored Jan 25, 2025
1 parent fca2114 commit b353e36
Showing 1 changed file with 1 addition and 6 deletions.
7 changes: 1 addition & 6 deletions README.md
Original file line number Diff line number Diff line change
Expand Up @@ -99,12 +99,7 @@ CUTLASS team is working on a fix.
# Performance

CUTLASS primitives are very efficient. When used to construct device-wide GEMM kernels,
they exhibit nearly optimal utilization of peak theoretical throughput. The figure below
shows CUTLASS 3.8's performance as a % of theoretical peak utilization
on various input and output data types when run on NVIDIA Blackwell SM100 architecture GPU.

<p align="center"><img src=media/images/cutlass-3.8-blackwell-gemm-peak-performance.svg></p>

they exhibit nearly optimal utilization of peak theoretical throughput.
The two figures below show the continual CUTLASS performance improvements
on an [NVIDIA H100](https://www.nvidia.com/en-us/data-center/h100/) (NVIDIA Hopper architecture) since
CUTLASS 3.1.
Expand Down

0 comments on commit b353e36

Please sign in to comment.