I’ve been trying to achieve something similar to this. I am working on a 2 GPU instance. The idea is basically to create 2n images, each GPU working on n images in parallel. I was successfully able to load different pipelines on to each GPU but getting this error: RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:0 and cuda:1!
Any thoughts on what may be the issue?