You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
### ***Support Full Precision Inference of MoE-based Deepseek R1 671B on AMD MI300:***
9
10
10
-
### What's New:
11
+
We compare three solutions that support <ins>Full-Precision Inference (PPL = 0) of Deepseek R1 671B</ins>. PPL = 0 means any quantization or unofficial sparsity techniques that may lower the scores of model, are prohibited.
12
+
13
+

14
+
15
+
-----------
16
+
17
+
## What's New:
18
+
19
+
- Tutel v0.4.0: Accelerating Deepseek R1 Full-precision-Chat for AMD MI300x8 (more platform support will be added in later versions):
@@ -74,35 +96,41 @@ Tutel MoE: An Optimized Mixture-of-Experts Implementation, also the first parall
74
96
```
75
97
76
98
- Tutel v0.1: Optimize the Einsum Complexity of Data Dispatch Encoding and Decoding, add 2DH option to deal with All-to-All at scale:
77
-
```py
99
+
```sh
78
100
>> Example (suggest enabling 2DH only at scale, note that the value of --nproc_per_node MUST equal to total physical GPU counts per node, e.g. 8 for A100x8):
### How to convert checkpoint files that adapt to different distributed world sizes:
143
-
Documentation has been moved [here](doc/CHECKPOINT.md).
177
+
-----------
178
+
179
+
### Advance: Convert Checkpoint Files for Different World Sizes:
180
+
Documentation for checkpoint conversion has been moved [here](doc/CHECKPOINT.md).
144
181
145
-
### How to import Tutel-optimized MoE in Pytorch:
182
+
### Examples: How to import Tutel-optimized MoE in Pytorch:
146
183
```
147
184
# Input Example:
148
185
import torch
@@ -177,6 +214,20 @@ y = moe_layer(x)
177
214
print(y)
178
215
```
179
216
217
+
### Reference
218
+
You can consult this [paper](https://arxiv.org/pdf/2206.03382.pdf) below to get to know more technical details about Tutel:
219
+
```
220
+
@article {tutel,
221
+
author = {Changho Hwang and Wei Cui and Yifan Xiong and Ziyue Yang and Ze Liu and Han Hu and Zilong Wang and Rafael Salas and Jithin Jose and Prabhat Ram and Joe Chau and Peng Cheng and Fan Yang and Mao Yang and Yongqiang Xiong},
222
+
title = {Tutel: Adaptive Mixture-of-Experts at Scale},
223
+
year = {2022},
224
+
month = jun,
225
+
journal = {CoRR},
226
+
volume= {abs/2206.03382},
227
+
url = {https://arxiv.org/pdf/2206.03382.pdf},
228
+
}
229
+
```
230
+
180
231
### Usage of MOELayer:
181
232
```
182
233
* Usage of MOELayer Args:
@@ -205,20 +256,6 @@ print(y)
205
256
has_fc2_bias : If set to False, the expert bias parameters `batched_fc2_bias` is disabled. Default: True
206
257
```
207
258
208
-
### Reference
209
-
You can consult this [paper](https://arxiv.org/pdf/2206.03382.pdf) below to get to know more technical details about Tutel:
210
-
```
211
-
@article {tutel,
212
-
author = {Changho Hwang and Wei Cui and Yifan Xiong and Ziyue Yang and Ze Liu and Han Hu and Zilong Wang and Rafael Salas and Jithin Jose and Prabhat Ram and Joe Chau and Peng Cheng and Fan Yang and Mao Yang and Yongqiang Xiong},
213
-
title = {Tutel: Adaptive Mixture-of-Experts at Scale},
214
-
year = {2022},
215
-
month = jun,
216
-
journal = {CoRR},
217
-
volume= {abs/2206.03382},
218
-
url = {https://arxiv.org/pdf/2206.03382.pdf},
219
-
}
220
-
```
221
-
222
259
### Contributing
223
260
224
261
This project welcomes contributions and suggestions. Most contributions require you to agree to a
0 commit comments