Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Model cannot be loaded properly onto Multiple GPU. #4

Open
is opened this issue May 14, 2023 · 1 comment
Open

Model cannot be loaded properly onto Multiple GPU. #4

is opened this issue May 14, 2023 · 1 comment

Comments

@is
Copy link

is commented May 14, 2023

I try to load 63b model into 2x V100 32G GPU.
Command line:

llmtune generate --model llama-65b-4bit \
--weights ../llama-int4/llama-65b-4bit.pt \
--prompt "Q: Roger has 5 tennis balls. He buys 2 more cans of tennis balls. Each can has 3 tennis balls. How many tennis balls does he have now? A: Let's think step-by-step."

I got the error:

Traceback (most recent call last):
  File "/home/c/envs/llmtune/bin/llmtune", line 33, in <module>
    sys.exit(load_entry_point('llmtune==0.1.0', 'console_scripts', 'llmtune')())
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 101, in main
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 116, in generate
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/llmtune-0.1.0-py3.10.egg/llmtune/executor.py", line 55, in generate
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/transformers/modeling_utils.py", line 1896, in to
    return super().to(*args, **kwargs)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 989, in to
    return self._apply(convert)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 641, in _apply
    module._apply(fn)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 664, in _apply
    param_applied = fn(param)
  File "/home/c/envs/llmtune/lib/python3.10/site-packages/torch/nn/modules/module.py", line 987, in convert
    return t.to(device, dtype if t.is_floating_point() or t.is_complex() else None, non_blocking)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 500.00 MiB (GPU 0; 31.75 GiB total capacity; 30.88 GiB already allocated; 137.75 MiB free; 30.88 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
@sidonsoft
Copy link

not training on multi gpu either

***** Running training *****
  Num examples = 98,084
  Num Epochs = 3
  Instantaneous batch size per device = 1
  Total train batch size (w. parallel, distributed & accumulation) = 2
  Gradient Accumulation steps = 2
  Total optimization steps = 147,126
  Number of trainable parameters = 20,971,520
  0%|                                                                                                                                                                                                                                                  | 0/147126 [00:00<?, ?it/s]Traceback (most recent call last):
  File "/usr/local/bin/llmtune", line 33, in <module>
    sys.exit(load_entry_point('llmtune==0.1.0', 'console_scripts', 'llmtune')())
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 101, in main
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/run.py", line 147, in finetune
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/executor.py", line 128, in finetune
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1662, in train
    return inner_training_loop(
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 1929, in _inner_training_loop
    tr_loss_step = self.training_step(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2699, in training_step
    loss = self.compute_loss(model, inputs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/trainer.py", line 2731, in compute_loss
    outputs = model(**inputs)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/peft/peft_model.py", line 575, in forward
    return self.base_model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 687, in forward
    outputs = self.model(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 577, in forward
    layer_outputs = decoder_layer(
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 305, in forward
    hidden_states = self.mlp(hidden_states)
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/transformers/models/llama/modeling_llama.py", line 157, in forward
    return self.down_proj(self.act_fn(self.gate_proj(x)) * self.up_proj(x))
  File "/usr/local/lib/python3.10/dist-packages/torch/nn/modules/module.py", line 1194, in _call_impl
    return forward_call(*input, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/accelerate/hooks.py", line 165, in new_forward
    output = old_forward(*args, **kwargs)
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/engine/quant/modules.py", line 77, in forward
  File "/usr/local/lib/python3.10/dist-packages/llmtune-0.1.0-py3.10.egg/llmtune/engine/quant/modules.py", line 100, in forward
RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! (when checking argument for argument mat2 in method wrapper_mm)
  0%|         

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants