Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Extra ai tools error #189

Closed
Eelluu opened this issue Jan 9, 2025 · 5 comments
Closed

Extra ai tools error #189

Eelluu opened this issue Jan 9, 2025 · 5 comments

Comments

@Eelluu
Copy link
Contributor

Eelluu commented Jan 9, 2025

Hello, I have built the extra ai tools as stated in the readme, but I get an error when running 2 of them, vllm and stable diffusion-webui.

I built vllm and installed, but everytime I am going to use it I get ModuleNotFoundError: No module named 'vllm'

Everytime I am generating an image using stable diffusion webui I get the same error:
pydantic_core._pydantic_core.ValidationError:3 validation errors for ProgressResponse
It seems that the image generates, but it is not shown on the webpage.

I am using rocm_sdk_builder 6.1.2 with an rx5500m (gfx1012). Llama cpp works great.

@lamikr
Copy link
Owner

lamikr commented Jan 9, 2025

Hi, I will try to test and check these during the weekend. I have myself rx5700 but I can do the build for rx5500 and try to check if I can force my rx5700 to be detected as a rx5500. Have you been able to do any other tests, like for example running the example:

/opt/rocm_sdk_612/docs/examples/pytorch/pytorch_cpu_vs_gpu_simple_benchmark_jupyter.sh

@lamikr
Copy link
Owner

lamikr commented Jan 10, 2025

In the mean time, I tested VLLM just in case with the fresh build AMD's 7700S GPU and Ubuntu 24.04. Here is the successful output for first test.

`
$ source /opt/rocm_sdk_612/bin/env_rocm.sh
$ cd /opt/rocm_sdk_612/docs/examples/llm/vllm
$ ./test_questions_and_answers.sh

WARNING 01-09 18:41:01 rocm.py:13] fork method is not supported by ROCm. VLLM_WORKER_MULTIPROC_METHOD is overridden to spawn instead.
[2025-01-09 18:41:02,859] [INFO] [real_accelerator.py:203:get_accelerator] Setting ds_accelerator to cuda (auto detect)
config.json: 100%|███████████████████████████████████████████████████████| 644/644 [00:00<00:00, 5.51MB/s]
INFO 01-09 18:41:05 config.py:928] Disabled the custom all-reduce kernel because it is not supported on AMD GPUs.
INFO 01-09 18:41:05 llm_engine.py:226] Initializing an LLM engine (v0.6.3.dev5+g5636751a) with config: model='facebook/opt-350m', speculative_config=None, tokenizer='facebook/opt-350m', skip_tokenizer_init=False, tokenizer_mode=auto, revision=None, override_neuron_config=None, rope_scaling=None, rope_theta=None, tokenizer_revision=None, trust_remote_code=False, dtype=torch.float16, max_seq_len=2048, download_dir=None, load_format=LoadFormat.AUTO, tensor_parallel_size=1, pipeline_parallel_size=1, disable_custom_all_reduce=True, quantization=None, enforce_eager=False, kv_cache_dtype=auto, quantization_param_path=None, device_config=cuda, decoding_config=DecodingConfig(guided_decoding_backend='outlines'), observability_config=ObservabilityConfig(otlp_traces_endpoint=None, collect_model_forward_time=False, collect_model_execute_time=False), seed=0, served_model_name=facebook/opt-350m, use_v2_block_manager=False, num_scheduler_steps=1, multi_step_stream_outputs=False, enable_prefix_caching=False, use_async_output_proc=True, use_cached_outputs=False, mm_processor_kwargs=None)
tokenizer_config.json: 100%|█████████████████████████████████████████████| 685/685 [00:00<00:00, 4.31MB/s]
vocab.json: 100%|██████████████████████████████████████████████████████| 899k/899k [00:00<00:00, 6.49MB/s]
merges.txt: 100%|██████████████████████████████████████████████████████| 456k/456k [00:00<00:00, 3.20MB/s]
special_tokens_map.json: 100%|███████████████████████████████████████████| 441/441 [00:00<00:00, 2.73MB/s]
/opt/rocm_sdk_612/lib/python3.11/site-packages/transformers/tokenization_utils_base.py:1617: FutureWarning: clean_up_tokenization_spaces was not set. It will be set to True by default. This behavior will be deprecated in transformers v4.45, and will be then set to False by default. For more details check this issue: huggingface/transformers#31884
warnings.warn(
generation_config.json: 100%|█████████████████████████████████████████████| 137/137 [00:00<00:00, 745kB/s]
INFO 01-09 18:41:07 selector.py:121] Using ROCmFlashAttention backend.
_ON_GCN5: True
INFO 01-09 18:41:07 model_runner.py:1014] Starting to load model facebook/opt-350m...
INFO 01-09 18:41:07 selector.py:121] Using ROCmFlashAttention backend.
INFO 01-09 18:41:08 weight_utils.py:242] Using model weights format ['*.bin']
pytorch_model.bin: 100%|███████████████████████████████████████████████| 663M/663M [01:00<00:00, 11.0MB/s]
Loading pt checkpoint shards: 0% Completed | 0/1 [00:00<?, ?it/s]
/opt/rocm_sdk_612/lib/python3.11/site-packages/vllm/model_executor/model_loader/weight_utils.py:424: FutureWarning: You are using torch.load with weights_only=False (the current default value), which uses the default pickle module implicitly. It is possible to construct malicious pickle data which will execute arbitrary code during unpickling (See https://github.com/pytorch/pytorch/blob/main/SECURITY.md#untrusted-models for more details). In a future release, the default value for weights_only will be flipped to True. This limits the functions that could be executed during unpickling. Arbitrary objects will no longer be allowed to be loaded via this mode unless they are explicitly allowlisted by the user via torch.serialization.add_safe_globals. We recommend you start setting weights_only=True for any use case where you don't have full control of the loaded file. Please open an issue on GitHub for any issues related to this experimental feature.
state = torch.load(bin_file, map_location="cpu")
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.87it/s]
Loading pt checkpoint shards: 100% Completed | 1/1 [00:00<00:00, 1.87it/s]

INFO 01-09 18:42:09 model_runner.py:1025] Loading model weights took 0.6178 GB
INFO 01-09 18:42:25 gpu_executor.py:122] # GPU blocks: 4006, # CPU blocks: 2730
INFO 01-09 18:42:28 model_runner.py:1329] Capturing the model for CUDA graphs. This may lead to unexpected consequences if the model is not static. To run the model in eager mode, set 'enforce_eager=True' or use '--enforce-eager' in the CLI.
INFO 01-09 18:42:28 model_runner.py:1333] CUDA graphs can take additional 1~3 GiB memory per GPU. If you are running out of memory, consider decreasing gpu_memory_utilization or enforcing eager mode. You can also reduce the max_num_seqs as needed to decrease memory usage.
INFO 01-09 18:42:41 model_runner.py:1456] Graph capturing finished in 13 secs.
Processed prompts: 100%|█| 4/4 [00:02<00:00, 1.76it/s, est. speed input: 26.46 toks/s, output: 138.47 tok

Prompt: "Tell me about the quote 'Float like a butterfly, sting like a bee'"
Answer:
"\nIt's a quote from the movie Safe Haven."

Prompt: 'Will Lauri Markkanen be traded to Golden State Warriors?'
Answer:
'\nThe Golden State Warriors have reportedly made the league's most costly acquisition.\nIf the report is accurate, a trade involving Raptors guard Lauri Markkanen could be on the cards.\nOn Thursday, ESPN's Adrian Wojnarowski reported that the Warriors are "ready to deal" the former Michigan State guard.\nHowever, Markkanen will reportedly remain with the Raptors until the final year of his contract.\nThe Warriors will reportedly hold a draft lottery on June 26, one day before the start of the season.\nMarkkanen has been a frustrating addition to the Raptors' roster, especially with Kawhi Leonard out'

Prompt: 'Tell me about the story of Paavo Nurmi Statues in Swedish Vasa war ship?'
Answer:
'\nSome sort of spell? I think so. Some kind of all-powerful weapon? Something. Like this?\nHe is not the only one. Maybe a black knight, a black knight would be cooler than this.'

Prompt: 'Who is Diego Maradona?'
Answer:
"\nDiego Maradona was born on September 6, 1986 and died on December 26, 2014. His death is deemed an accident by the World Medical Association.\nHe was a politician and was a part of Argentina's national soccer team and made his professional debut in the 1999 World Cup.\nWho was Diego Maradona?\nBorn Diego Maradona, the Argentine soccer legend played for Argentina from 1995 to 2004. He won two World Cups and won the 1998 FIFA World Cup as well as the 2006 European Championship.\nHe was a member of the national soccer team and won four FIFA World Cups and was the only player to"
`

@lamikr
Copy link
Owner

lamikr commented Jan 10, 2025

With stable diffusion I am getting same error and no-picture unless I slide the "sampling steps" to be smaller than 20.
Can you try to change it first to 5 or 5 for example and then checking whether you get picture?

@Eelluu
Copy link
Contributor Author

Eelluu commented Jan 10, 2025

I've managed to get vllm working after forcing the reinstall with the --no-deps argument and the test now works correctly.
I've tried also to reduce the sampling steps and the limit I've found is 6 sampling steps. Starting from 7 I get no image. With stable-diffusion.cpp I get cudaMalloc failed: out of memory, which I also get in the webui if I don´t reduce the size of the image (i guess due to the small 4gb of VRAM). When the image size is reduced, SD.cpp works fine

@lamikr
Copy link
Owner

lamikr commented Jan 10, 2025

Thanks for confirming. I guess it's fine to close this now. Unless you want to add a some more documentation to README.md in case somebody else run to same problem?

@Eelluu Eelluu closed this as completed Jan 11, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants