Multiple threads of Stable diffusion Inpainting slows down the inference on same GPU

I am using Stable diffusion inpainting pipeline to generate some inference results on a A100 (40 GB) GPU. For a 512X512 image it is taking approx 3 s per image and takes about 5 GB of space on the GPU.

In order to have faster inference, I am trying to run 2 threads (2 inference scripts). However, as soon as I start them simultaneously. The inference time decreases to ~6 sec per thread with an effective time of ~3 s per image.

I am unable to understand why this is so. I still have a lot of space available (about 35 GB) on GPU and quite a big CPU ram of 32 GB.

Can someone help me in this regard?

Hi @garg-aayush, I think that’s to be expected to some extent, at the end of the day your GPU needs to run double as many computations.

Instead of threading, I’d recommend you accumulate your requests in a batch and pass the whole batch for inference. This should scale a bit better, I think.

1 Like

Actually, I never faced this issue while inferring using other models (segmentation or classification). However, it seems the case here.

I was looking at the number of SMs cores being utilised in Stable diffusion pipeline case. It shows SMs being utilised at max capacity:
Screenshot 2022-12-06 at 6.24.45 PM

Maybe, this is the reason why multiple threads are not working. Maybe, this is what you meant by double the computations.


1 Like