You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardexpand all lines: README.md
+4-1
Original file line number
Diff line number
Diff line change
@@ -99,11 +99,14 @@ Usage of MOELayer:
99
99
or a list of dict-type gate descriptions, e.g. [{'type': 'top', 'k', 2}, {'type': 'top', 'k', 2}],
100
100
the value of k in top-gating can be also negative, like -2, which indicates one GPU will hold 1/(-k) parameters of an expert
101
101
model_dim : the number of channels for MOE's input tensor
102
-
experts : a dict-type config for builtin expert network, or a torch.nn.Module-type custom expert network
102
+
experts : a dict-type config for builtin expert network
103
103
scan_expert_func : allow users to specify a lambda function to iterate each experts param, e.g. `scan_expert_func = lambda name, param: setattr(param, 'expert', True)`
104
104
result_func : allow users to specify a lambda function to format the MoE output and aux_loss, e.g. `result_func = lambda output: (output, output.l_aux)`
105
105
group : specify the explicit communication group of all_to_all
106
106
seeds : a tuple containing a tripple of int to specify manual seed of (shared params, local params, others params after MoE's)
107
+
a2a_ffn_overlap_degree : the value to control a2a overlap depth, 1 by default for no overlap, 2 for overlap a2a with half gemm, ..
108
+
parallel_type : the parallel method to compute MoE, valid types: 'auto', 'data', 'model'
109
+
pad_samples : whether do auto padding on newly-coming input data to maximum data size in history
logging.warning(f'`fp32_gate` option in tutel.moe_layer has been deprecated, please move this option to gate_type = {{.., "fp32_gate": {kwargs["fp32_gate"]}}} instead.')
0 commit comments