Model cannot be loaded properly onto Multiple GPU. #4

is · 2023-05-14T04:20:30Z

I try to load 63b model into 2x V100 32G GPU.
Command line:

llmtune generate --model llama-65b-4bit \
--weights ../llama-int4/llama-65b-4bit.pt \
--prompt "Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Let's think step-by-step."

I got the error:

Traceback (most recent call last):
  File "/home/c/envs/llmtune/bin/llmtune", line 33, in <module>
    sys.exit(load_entry_point('llmtune==0.1.0', 'console_scripts', 'llmtune')())
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 101, in main
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 116, in generate
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/llmtune-0.1.0-py3.10.egg/llmtune/executor.py", line 55, in generate
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1896, in to
    return super().to(*args, **kwargs)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
    return self._apply(convert)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 500.00 MiB (GPU 0; 31.75 GiB total capacity; 30.88 GiB already allocated; 137.75 MiB free; 30.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

The text was updated successfully, but these errors were encountered:

sidonsoft · 2023-05-15T20:02:19Z

not training on multi gpu either

***** Running training *****
  Num examples = 98,084
  Num Epochs = 3
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 2
  Total optimization steps = 147,126
  Number of trainable parameters = 20,971,520
  0%|                                                                                                                                                                                                                                                  | 0/147126 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/usr/local/bin/llmtune", line 33, in <module>
    sys.exit(load_entry_point('llmtune==0.1.0', 'console_scripts', 'llmtune')())
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 101, in main
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 147, in finetune
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/executor.py", line 128, in finetune
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2699, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2731, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 575, in forward
    return self.base_model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
    hidden_states = self.mlp(hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 157, in forward
    return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/engine/quant/modules.py", line 77, in forward
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/engine/quant/modules.py", line 100, in forward
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_mm)
  0%|

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Model cannot be loaded properly onto Multiple GPU. #4

Model cannot be loaded properly onto Multiple GPU. #4

is commented May 14, 2023

sidonsoft commented May 15, 2023

Model cannot be loaded properly onto Multiple GPU. #4

Model cannot be loaded properly onto Multiple GPU. #4

Comments

is commented May 14, 2023

sidonsoft commented May 15, 2023