Skip to content

Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with single line of code 🔥 🔥

License

Notifications You must be signed in to change notification settings

harishprabhala/voltaML-fast-stable-diffusion

 
 

Repository files navigation

Screenshot from 2022-11-22 15-29-39

⚡voltaML-fast-stable-diffusion 🔥 🔥

Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with one single line of code.

🔥Accelerate Computer vision, NLP models etc. with voltaML. Upto 10X speed up in inference🔥

Installation

voltaML Docker Container 🐳

git clone https://github.com/VoltaML/voltaML-fast-stable-diffusion.git
cd voltaML-fast-stable-diffusion

sudo docker pull voltaml/volta_diffusion:v0.2 

sudo docker run -it --gpus all --ipc=host --ulimit memlock=-1 --ulimit stack=67108864 -v $(pwd):/code --rm voltaml/volta_diffusion:v0.2 

Own setup:

Requirements: Please refer to the requirements.txt file to set it up on your own environment.

It is recommended to use our voltaml/volta_diffusion container or NVIDIA TensorRT container

Usage

Hugging Face Login

Login into your Hugging Face account through the terminal

huggingface-cli login
Token: #enter your huggingface token

Accelerate

bash optimize.sh --model='runwayml/stable-diffusion-v1-5' # your model path/ hugging face name

Inference

For TensorRT

python3 volta_infer.py --backend='TRT' --prompt='a gigantic robotic bipedal dinosaur, highly detailed, photorealistic, digital painting, artstation, concept art, sharp focus, illustration, art by greg rutkowski and alphonse mucha'

For PyTorch

python3 volta_infer.py --backend='PT' --prompt='a gigantic robotic bipedal dinosaur, highly detailed, photorealistic, digital painting, artstation, concept art, sharp focus, illustration, art by greg rutkowski and alphonse mucha'

Benchmark

python3 volta_infer.py --backend='TRT' --benchmark

The below benchmarks have been done for generating a 512x512 image, batch size 1 for 50 iterations.

Model T4 (it/s) A10 (it/s) A100 (it/s)
PyTorch 4.3 8.8 15.1
Flash attention xformers 5.5 15.6 27.5
VoltaML(TRT) 7.7 17.2 36.1

diffusion posts diffusion posts 1 diffusion posts 3 diffusion posts 4

To-Do:

  • Integrate Flash-attention
  • Integrate AITemplate
  • Try Flash-attention with TensorRT

Contribution:

We invite the open source community to contribute and help us better voltaML. Please check out our contribution guide

References

About

Lightweight library to accelerate Stable-Diffusion, Dreambooth into fastest inference models with single line of code 🔥 🔥

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages

  • Python 98.7%
  • Other 1.3%