How can I use multi gpu on stable diffusion pipeline?

I need just inference.

1 Like

As do iiiiii :100:

+1
I tried batching by clubbing the prompts. The it/s reduced from 4 it/s to 1it/s for a batch of 4 prompts

I tried adding multiple T4 GPUs to the gcloud vm and tried running unsuccessfully. monitoring nvidia-smi shows the second GPU is not begin used at all

Same here. Did you find some insights on how to do it? I tried device_map=“auto” option, but still only one GPU is used.

3 Likes

I think the only option, at the moment, is to create multiple instances. So, if you want to run a batch, run one instance for each GPU that you have. Set each instance to each individual GPU and increment the seed by 1 per batch, and by 4 (if using 4 GPUs), so each one is processing a different output with the same settings.

Similar setup if you want to produce more passes, but the same seed. Change the per-instance value by one value, and each individual instances values increment by 4x that same initial offset value.

Natively, “we are not there yet”, but I am sure they are working on it. I know a few GUIs do this to create animations. Each “frame” goes to a different GPU. But, only ONE GPU can work on ONE output at a time. Creation, at the moment, is linear. Until they start making “larger creations” that can be made from “areas”, and the results merged back into one singular output. (Like how a 3D rendering program works, in tiny little boxes, to create one large image.)

2 Likes

If could be nice to run batch img_to_img on two GPUs. It could be possible by setting CUDA_VISIBLE_DEVICES to a number of the specific GPU before the launch of each WebUI instance.
127.0.0.1:7860 on GPU 0 and 127.0.0.1:7861 on GPU 1, for example.

Will have to try it out.

I’ve been trying to achieve something similar to this. I am working on a 2 GPU instance. The idea is basically to create 2n images, each GPU working on n images in parallel. I was successfully able to load different pipelines on to each GPU but getting this error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1! Any thoughts on what may be the issue?

curious, i am trying an old gpu mining rig to see if this is possible too, not very stable though, still working on it. but plan is to eventually put the 3060’s together…

1 Like

hey all, I would recommend creating two separate pipeline instances and moving them independently to the two separate gpus.

I am following behind a little later, but have a similar rig. Did you ever have success with this?

still working on it, but I have been hearing alot about multiple pipelines, so still looking into that. Btw, getting 3 8g 1080’s ready for this, somehow.

I’d recommend our new tutorial in Accelerate: Distributed Inference with 🤗 Accelerate

1 Like

I checked it, but didn’t find a way to use 8 cuda to use the second method which is about loading chunks of the model into multiple GPU to save memory
=> 1. Loading parts of a model onto each GPU and processing a single input at one time