I need just inference.
As do iiiiii
+1
I tried batching by clubbing the prompts. The it/s reduced from 4 it/s to 1it/s for a batch of 4 prompts
I tried adding multiple T4 GPUs to the gcloud vm and tried running unsuccessfully. monitoring nvidia-smi
shows the second GPU is not begin used at all
Same here. Did you find some insights on how to do it? I tried device_map=“auto” option, but still only one GPU is used.
I think the only option, at the moment, is to create multiple instances. So, if you want to run a batch, run one instance for each GPU that you have. Set each instance to each individual GPU and increment the seed by 1 per batch, and by 4 (if using 4 GPUs), so each one is processing a different output with the same settings.
Similar setup if you want to produce more passes, but the same seed. Change the per-instance value by one value, and each individual instances values increment by 4x that same initial offset value.
Natively, “we are not there yet”, but I am sure they are working on it. I know a few GUIs do this to create animations. Each “frame” goes to a different GPU. But, only ONE GPU can work on ONE output at a time. Creation, at the moment, is linear. Until they start making “larger creations” that can be made from “areas”, and the results merged back into one singular output. (Like how a 3D rendering program works, in tiny little boxes, to create one large image.)
If could be nice to run batch img_to_img on two GPUs. It could be possible by setting CUDA_VISIBLE_DEVICES to a number of the specific GPU before the launch of each WebUI instance.
127.0.0.1:7860 on GPU 0 and 127.0.0.1:7861 on GPU 1, for example.
Will have to try it out.
I’ve been trying to achieve something similar to this. I am working on a 2 GPU instance. The idea is basically to create 2n images, each GPU working on n images in parallel. I was successfully able to load different pipelines on to each GPU but getting this error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Any thoughts on what may be the issue?
curious, i am trying an old gpu mining rig to see if this is possible too, not very stable though, still working on it. but plan is to eventually put the 3060’s together…
hey all, I would recommend creating two separate pipeline instances and moving them independently to the two separate gpus.
I am following behind a little later, but have a similar rig. Did you ever have success with this?
still working on it, but I have been hearing alot about multiple pipelines, so still looking into that. Btw, getting 3 8g 1080’s ready for this, somehow.
I’d recommend our new tutorial in Accelerate: Distributed Inference with 🤗 Accelerate
I checked it, but didn’t find a way to use 8 cuda to use the second method which is about loading chunks of the model into multiple GPU to save memory
=> 1. Loading parts of a model onto each GPU and processing a single input at one time