Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with one single line of code.
🔥Accelerate Computer vision, NLP models etc. with voltaML. Upto 10X speed up in inference🔥
git clone https://github.com/VoltaML/voltaML-fast-stable-diffusion.git
cd voltaML-fast-stable-diffusion
sudo docker pull voltaml/volta_diffusion:v0.2
sudo docker run -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v $(pwd):/code --rm voltaml/volta_diffusion:v0.2
Requirements: Please refer to the requirements.txt file to set it up on your own environment.
It is recommended to use our voltaml/volta_diffusion container or NVIDIA TensorRT container
Login into your Hugging Face account through the terminal
huggingface-cli login
Token: #enter your huggingface token
bash optimize.sh --model='runwayml/stable-diffusion-v1-5' # your model path/ hugging face name
For TensorRT
python3 volta_infer.py --backend='TRT' --prompt='a gigantic robotic bipedal dinosaur, highly detailed, photorealistic, digital painting, artstation, concept art, sharp focus, illustration, art by greg rutkowski and alphonse mucha'
For PyTorch
python3 volta_infer.py --backend='PT' --prompt='a gigantic robotic bipedal dinosaur, highly detailed, photorealistic, digital painting, artstation, concept art, sharp focus, illustration, art by greg rutkowski and alphonse mucha'
python3 volta_infer.py --backend='TRT' --benchmark
The below benchmarks have been done for generating a 512x512 image, batch size 1 for 50 iterations.
Model | T4 (it/s) | A10 (it/s) | A100 (it/s) |
---|---|---|---|
PyTorch | 4.3 | 8.8 | 15.1 |
Flash attention xformers | 5.5 | 15.6 | 27.5 |
VoltaML(TRT) | 7.7 | 17.2 | 36.1 |
- Integrate Flash-attention
- Integrate AITemplate
- Try Flash-attention with TensorRT
We invite the open source community to contribute and help us better voltaML. Please check out our contribution guide