Multiple threads of Stable diffusion Inpainting slows down the inference on same GPU

garg-aayush · December 6, 2022, 4:05am

I am using Stable diffusion inpainting pipeline to generate some inference results on a A100 (40 GB) GPU. For a 512X512 image it is taking approx 3 s per image and takes about 5 GB of space on the GPU.

In order to have faster inference, I am trying to run 2 threads (2 inference scripts). However, as soon as I start them simultaneously. The inference time decreases to ~6 sec per thread with an effective time of ~3 s per image.

I am unable to understand why this is so. I still have a lot of space available (about 35 GB) on GPU and quite a big CPU ram of 32 GB.

Can someone help me in this regard?

pcuenq · December 6, 2022, 9:50am

Hi @garg-aayush, I think that’s to be expected to some extent, at the end of the day your GPU needs to run double as many computations.

Instead of threading, I’d recommend you accumulate your requests in a batch and pass the whole batch for inference. This should scale a bit better, I think.

garg-aayush · December 6, 2022, 1:08pm

Actually, I never faced this issue while inferring using other models (segmentation or classification). However, it seems the case here.

I was looking at the number of SMs cores being utilised in Stable diffusion pipeline case. It shows SMs being utilised at max capacity:
Screenshot 2022-12-06 at 6.24.45 PM

Maybe, this is the reason why multiple threads are not working. Maybe, this is what you meant by double the computations.

Thanks

Hishambarakat · March 14, 2025, 8:17am

im trying to figure this issue out as well. i found that there are two ways to perform batching with Stable diffusion using diffusers. the first way is to use the variable num_images_per_prompt, but what’s faster is to provide prompts in a list variable. like prompt = [“prompt 1”, “prompt 2”], it will be faster.

but to perform multi inference using diffusion, i am also struggling with this and looking for answers

John6666 · March 14, 2025, 11:22am

There used to be a problem with multi-threading in Diffusers. If you have 40GB of VRAM, I think the fastest way is to use multi-processes, even though it’s inefficient with VRAM…

github.com/huggingface/diffusers

Concurrent Thread Failure in Image Inference: StableDiffusionPipeline

opened 09:06AM - 05 Jun 23 UTC

closed 01:52AM - 13 Jun 23 UTC

Fantast616

bug

### Describe the bug When I attempt to concurrently initiate two threads for …image inference in one pipe, the process fails and the following information is displayed as logs during the inference. I intend to submit a pull request for thread-safety in one inference pipe if anyone else need. ### Reproduction ```python from diffusers import StableDiffusionPipeline import torch import threading model_id = "runwayml/stable-diffusion-v1-5" pipe = StableDiffusionPipeline.from_pretrained(model_id, torch_dtype=torch.float16) pipe = pipe.to("cuda") def infer_one(): pipe("a photo of an astronaut riding a horse on mars").images[0] x1 = threading.Thread(target=infer_one) x1.start() x2 = threading.Thread(target=infer_one) x2.start() ``` ### Logs ```shell Exception in thread Thread-1 (infer_one): | 0/50 [00:00<?, ?it/s] Traceback (most recent call last): File "/opt/miniconda3/envs/diffusers/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/opt/miniconda3/envs/diffusers/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/mnt/workdata/A40/main.py", line 10, in infer_one pipe("a photo of an astronaut riding a horse on mars").images[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/diffusers/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 710, in __call__ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/schedulers/scheduling_pndm.py", line 221, in step return self.step_plms(model_output=model_output, timestep=timestep, sample=sample, return_dict=return_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/schedulers/scheduling_pndm.py", line 337, in step_plms prev_sample = self._get_prev_sample(sample, timestep, prev_timestep, model_output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/schedulers/scheduling_pndm.py", line 371, in _get_prev_sample alpha_prod_t = self.alphas_cumprod[timestep] ~~~~~~~~~~~~~~~~~~~^^^^^^^^^^ IndexError: index 1001 is out of bounds for dimension 0 with size 1000 0%| | 0/50 [00:04<?, ?it/s] Exception in thread Thread-2 (infer_one): Traceback (most recent call last): File "/opt/miniconda3/envs/diffusers/lib/python3.11/threading.py", line 1038, in _bootstrap_inner self.run() File "/opt/miniconda3/envs/diffusers/lib/python3.11/threading.py", line 975, in run self._target(*self._args, **self._kwargs) File "/mnt/workdata/A40/main.py", line 10, in infer_one pipe("a photo of an astronaut riding a horse on mars").images[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/opt/miniconda3/envs/diffusers/lib/python3.11/site-packages/torch/utils/_contextlib.py", line 115, in decorate_context return func(*args, **kwargs) ^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/pipelines/stable_diffusion/pipeline_stable_diffusion.py", line 710, in __call__ latents = self.scheduler.step(noise_pred, t, latents, **extra_step_kwargs, return_dict=False)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/schedulers/scheduling_pndm.py", line 221, in step return self.step_plms(model_output=model_output, timestep=timestep, sample=sample, return_dict=return_dict) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/schedulers/scheduling_pndm.py", line 337, in step_plms prev_sample = self._get_prev_sample(sample, timestep, prev_timestep, model_output) ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ File "/mnt/workdata/yehang/projects/diffusers/src/diffusers/schedulers/scheduling_pndm.py", line 396, in _get_prev_sample sample_coeff * sample - (alpha_prod_t_prev - alpha_prod_t) * model_output / model_output_denom_coeff ~~~~~~~~~~~~~^~~~~~~~ TypeError: unsupported operand type(s) for *: 'Tensor' and 'NoneType' ``` ### System Info - `diffusers` version: 0.17.0.dev0 - Platform: Linux-5.15.0-72-generic-x86_64-with-glibc2.31 - Python version: 3.11.3 - PyTorch version (GPU?): 2.0.1+cu117 (True) - Huggingface_hub version: 0.14.1 - Transformers version: 4.29.1 - Accelerate version: 0.19.0 - xFormers version: not installed - Using GPU in script?: <fill in> - Using distributed or parallel set-up in script?: <fill in>

Topic		Replies	Views
How can I use multi gpu on stable diffusion pipeline? 🧨 Diffusers	13	37311	October 8, 2024
Can a diffuser pipeline run on multiple GPUs? Amazon SageMaker	2	1225	May 31, 2023
Issues with diffusers and deepspeed's init_inference() in latest version 🧨 Diffusers	0	226	December 4, 2023
How to optimize inference of stable diffusion model when the images generated are of different seed but with same prompt? 🧨 Diffusers	2	1430	February 7, 2024
Concurrent inference on a single GPU Beginners	3	2503	November 28, 2021

Multiple threads of Stable diffusion Inpainting slows down the inference on same GPU

Related topics