-
Notifications
You must be signed in to change notification settings - Fork 7.7k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Flux.1 Schnell, memory issue on AMD Rocm #4341
Comments
I noticed that the Before the changes I could stay under 12GB total VRAM usage when loading a After the changes, I run into the 16GB memory limit when the FLUX transformer unet is loaded. |
I tried to run the smallest Flux.1 Schnell GGUF and I also have this issue
|
This is
|
|
@arch-user-france1 But I could run "stable diffusion rocm" on my hardware loading the model on RAM, I think flux.1 schnell quantized should work too |
Flux needs about 32 GB in bfloat16 and I do not expect this to reduce to 1 GB. Are you really running a quantized model, and if so, what dtype is it in? |
With stable diffusion I could load the model in RAM (I've 48GB) and it worked, flux1-schnell-Q2_K.gguf |
Looks like the GGUF model already is larger than 1 GB. Are you able to enable CPU-offloading somewhere? Note that dRAM is not related to the model size your device can handle. You need more vRAM. |
I think the issue is a regression in the latest versions of pytorch / rocm |
AMD have added my issue to their internal tracker so hopefully they can reproduce and get a fix out. |
You stated: "I have AMD Radeon RX 7800 XT 16 GB, I couldn't select it in the list." What list do you mean? Ah okay, there's no option for that in the AMD issue tracker. Still, you might consider to reinstall your driver if you haven't done that yet, and specifically without DKMS, which has been working well for me. |
Try create an issue on their GitHub page, it's a list |
lol every day it is something new. i kind of enjoy it at this point. |
Try set |
yes you would think i would know this by now |
I want to use the iGPU (it worked with stable diffusion web ui), with that ENV variables I get:
|
Using rocm , AMD GPU, you HAVE to utilize xformers to be leveraged for doing any FLUX.1 models really |
Expected Behavior
I expect it should work because I can run SDXL with AMD Ryzen 7 7700 on Linux
Actual Behavior
Steps to Reproduce
i installed comfyui via docker
Debug Logs
comfyui-rocm | [INFO] Running set-proxy script...
comfyui-rocm | [INFO] Continue without proxy.
comfyui-rocm | [INFO] Running pre-start script...
comfyui-rocm | [INFO] Continue without pre-start script.
comfyui-rocm | ########################################
comfyui-rocm | [INFO] Starting ComfyUI...
comfyui-rocm | ########################################
comfyui-rocm | [ComfyUI-Manager] 'distutils' package not found. Activating fallback mode for compatibility.
comfyui-rocm | [START] Security scan
comfyui-rocm | [DONE] Security scan
comfyui-rocm | ## ComfyUI-Manager: installing dependencies done.
comfyui-rocm | ** ComfyUI startup time: 2024-08-13 19:33:29.232847
comfyui-rocm | ** Platform: Linux
comfyui-rocm | ** Python version: 3.10.14 (main, Mar 21 2024, 16:45:28) [GCC]
comfyui-rocm | ** Python executable: /usr/bin/python3
comfyui-rocm | ** ComfyUI Path: /root/ComfyUI
comfyui-rocm | ** Log path: /root/comfyui.log
comfyui-rocm |
comfyui-rocm | Prestartup times for custom nodes:
comfyui-rocm | 0.5 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Manager
comfyui-rocm |
comfyui-rocm | Total VRAM 512 MB, total RAM 47379 MB
comfyui-rocm | pytorch version: 2.1.2+rocm6.1.3
comfyui-rocm | Set vram state to: LOW_VRAM
comfyui-rocm | Device: cuda:0 AMD Radeon Graphics : native
comfyui-rocm | Using split optimization for cross attention
comfyui-rocm | [Prompt Server] web root: /root/ComfyUI/web
comfyui-rocm | ### Loading: ComfyUI-Manager (V2.48.7)
comfyui-rocm | ### ComfyUI Revision: 174 [39fb74c] | Released on '2024-08-13'
comfyui-rocm | [Crystools INFO] Crystools version: 1.16.6
comfyui-rocm | [Crystools INFO] CPU: AMD Ryzen 7 7700 8-Core Processor - Arch: x86_64 - OS: Linux 6.1.0-18-amd64
comfyui-rocm | [Crystools ERROR] Could not init pynvml (Nvidia).NVML Shared Library Not Found
comfyui-rocm | [Crystools WARNING] No GPU with CUDA detected.
Efficiency Nodes: Attempting to add Control Net options to the 'HiRes-Fix Script' Node (comfyui_controlnet_aux add-on)...Success!
comfyui-rocm | ### Loading: ComfyUI-Impact-Pack (V6.2)
comfyui-rocm | ### Loading: ComfyUI-Impact-Pack (Subpack: V0.6)
comfyui-rocm | [Impact Pack] Wildcards loading done.
comfyui-rocm | ### Loading: ComfyUI-Inspire-Pack (V0.83)
comfyui-rocm | [AnimateDiffEvo] - ERROR - No motion models found. Please download one and place in: ['/root/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved/models', '/root/ComfyUI/models/animatediff_models']
comfyui-rocm |
comfyui-rocm | Import times for custom nodes:
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/websocket_image_save.py
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Crystools-save
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/sdxl_prompt_styler
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/cg-use-everywhere
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/comfyui_controlnet_aux
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/AIGODLIKE-ComfyUI-Translation
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI_IPAdapter_plus
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Custom-Scripts
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Frame-Interpolation
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Manager
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI_essentials
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI-AnimateDiff-Evolved
comfyui-rocm | 0.0 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Inspire-Pack
comfyui-rocm | 0.1 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Crystools
comfyui-rocm | 0.1 seconds: /root/ComfyUI/custom_nodes/ComfyUI-Impact-Pack
comfyui-rocm | 0.3 seconds: /root/ComfyUI/custom_nodes/ComfyUI-VideoHelperSuite
comfyui-rocm | 0.4 seconds: /root/ComfyUI/custom_nodes/efficiency-nodes-comfyui
comfyui-rocm |
comfyui-rocm | Starting server
comfyui-rocm |
comfyui-rocm | To see the GUI go to: http://0.0.0.0:8188
comfyui-rocm | FETCH DATA from: /root/ComfyUI/custom_nodes/ComfyUI-Manager/extension-node-map.json [DONE]
comfyui-rocm | got prompt
comfyui-rocm | model weight dtype torch.bfloat16, manual cast: None
comfyui-rocm | model_type FLOW
comfyui-rocm | /usr/local/lib/python3.10/site-packages/transformers/tokenization_utils_base.py:1601: FutureWarning:
clean_up_tokenization_spaces
was not set. It will be set toTrue
by default. This behavior will be depracted in transformers v4.45, and will be then set toFalse
by default. For more details check this issue: huggingface/transformers#31884comfyui-rocm | warnings.warn(
comfyui-rocm | Requested to load FluxClipModel_
comfyui-rocm | Loading 1 new model
comfyui-rocm | clip missing: ['text_projection.weight']
comfyui-rocm | Requested to load Flux
comfyui-rocm | Loading 1 new model
comfyui-rocm | loaded partially 64.0 60.7852783203125 0
0%| | 0/4 [01:11<?, ?it/s]
comfyui-rocm | !!! Exception during processing!!! HIP out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacty of 512.00 MiB of which 16.00 MiB is free. Of the allocated memory 204.22 MiB is allocated by PyTorch, and 177.78 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
comfyui-rocm | Traceback (most recent call last):
comfyui-rocm | File "/root/ComfyUI/execution.py", line 152, in recursive_execute
comfyui-rocm | output_data, output_ui = get_output_data(obj, input_data_all)
comfyui-rocm | File "/root/ComfyUI/execution.py", line 82, in get_output_data
comfyui-rocm | return_values = map_node_over_list(obj, input_data_all, obj.FUNCTION, allow_interrupt=True)
comfyui-rocm | File "/root/ComfyUI/execution.py", line 75, in map_node_over_list
comfyui-rocm | results.append(getattr(obj, func)(**slice_dict(input_data_all, i)))
comfyui-rocm | File "/root/ComfyUI/comfy_extras/nodes_custom_sampler.py", line 612, in sample
comfyui-rocm | samples = guider.sample(noise.generate_noise(latent), latent_image, sampler, sigmas, denoise_mask=noise_mask, callback=callback, disable_pbar=disable_pbar, seed=noise.seed)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 716, in sample
comfyui-rocm | output = self.inner_sample(noise, latent_image, device, sampler, sigmas, denoise_mask, callback, disable_pbar, seed)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 695, in inner_sample
comfyui-rocm | samples = sampler.sample(self, sigmas, extra_args, callback, noise, latent_image, denoise_mask, disable_pbar)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 600, in sample
comfyui-rocm | samples = self.sampler_function(model_k, noise, sigmas, extra_args=extra_args, callback=k_callback, disable=disable_pbar, **self.extra_options)
comfyui-rocm | File "/usr/local/lib64/python3.10/site-packages/torch/utils/contextlib.py", line 115, in decorate_context
comfyui-rocm | return func(*args, **kwargs)
comfyui-rocm | File "/root/ComfyUI/comfy/k_diffusion/sampling.py", line 143, in sample_euler
comfyui-rocm | denoised = model(x, sigma_hat * s_in, **extra_args)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 299, in call
comfyui-rocm | out = self.inner_model(x, sigma, model_options=model_options, seed=seed)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 682, in call
comfyui-rocm | return self.predict_noise(*args, **kwargs)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 685, in predict_noise
comfyui-rocm | return sampling_function(self.inner_model, x, timestep, self.conds.get("negative", None), self.conds.get("positive", None), self.cfg, model_options=model_options, seed=seed)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 279, in sampling_function
comfyui-rocm | out = calc_cond_batch(model, conds, x, timestep, model_options)
comfyui-rocm | File "/root/ComfyUI/comfy/samplers.py", line 228, in calc_cond_batch
comfyui-rocm | output = model.apply_model(input_x, timestep, **c).chunk(batch_chunks)
comfyui-rocm | File "/root/ComfyUI/custom_nodes/ComfyUI-Advanced-ControlNet/adv_control/utils.py", line 68, in apply_model_uncond_cleanup_wrapper
comfyui-rocm | return orig_apply_model(self, *args, **kwargs)
comfyui-rocm | File "/root/ComfyUI/comfy/model_base.py", line 145, in apply_model
comfyui-rocm | model_output = self.diffusion_model(xc, t, context=context, control=control, transformer_options=transformer_options, **extra_conds).float()
comfyui-rocm | File "/usr/local/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
comfyui-rocm | return self._call_impl(*args, **kwargs)
comfyui-rocm | File "/usr/local/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
comfyui-rocm | return forward_call(*args, **kwargs)
comfyui-rocm | File "/root/ComfyUI/comfy/ldm/flux/model.py", line 150, in forward
comfyui-rocm | out = self.forward_orig(img, img_ids, context, txt_ids, timestep, y, guidance, control)
comfyui-rocm | File "/root/ComfyUI/comfy/ldm/flux/model.py", line 129, in forward_orig
comfyui-rocm | img = block(img, vec=vec, pe=pe)
comfyui-rocm | File "/usr/local/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
comfyui-rocm | return self._call_impl(*args, **kwargs)
comfyui-rocm | File "/usr/local/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
comfyui-rocm | return forward_call(*args, **kwargs)
comfyui-rocm | File "/root/ComfyUI/comfy/ldm/flux/layers.py", line 233, in forward
comfyui-rocm | output = self.linear2(torch.cat((attn, self.mlp_act(mlp)), 2))
comfyui-rocm | File "/usr/local/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1518, in _wrapped_call_impl
comfyui-rocm | return self._call_impl(*args, **kwargs)
comfyui-rocm | File "/usr/local/lib64/python3.10/site-packages/torch/nn/modules/module.py", line 1527, in _call_impl
comfyui-rocm | return forward_call(*args, **kwargs)
comfyui-rocm | File "/root/ComfyUI/comfy/ops.py", line 63, in forward
comfyui-rocm | return self.forward_comfy_cast_weights(*args, **kwargs)
comfyui-rocm | File "/root/ComfyUI/comfy/ops.py", line 58, in forward_comfy_cast_weights
comfyui-rocm | weight, bias = cast_bias_weight(self, input)
comfyui-rocm | File "/root/ComfyUI/comfy/ops.py", line 42, in cast_bias_weight
comfyui-rocm | weight = cast_to(s.weight, dtype, device, non_blocking=non_blocking)
comfyui-rocm | File "/root/ComfyUI/comfy/ops.py", line 24, in cast_to
comfyui-rocm | return weight.to(device=device, dtype=dtype, non_blocking=non_blocking)
comfyui-rocm | torch.cuda.OutOfMemoryError: HIP out of memory. Tried to allocate 90.00 MiB. GPU 0 has a total capacty of 512.00 MiB of which 16.00 MiB is free. Of the allocated memory 204.22 MiB is allocated by PyTorch, and 177.78 MiB is reserved by PyTorch but unallocated. If reserved but unallocated memory is large try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_HIP_ALLOC_CONF
comfyui-rocm |
comfyui-rocm | Got an OOM, unloading all loaded models.
comfyui-rocm | Prompt executed in 98.52 seconds
The text was updated successfully, but these errors were encountered: