Expected all tensors to be on the same device. Running base.to("cuda:0") and refiner.to("cuda:1") Model parallism

Getting this error:

RuntimeError: Expected all tensors to be on the same device, but found at least two devices, cuda:1 and cuda:0! 
(when checking argument for argument index in method wrapper_CUDA__index_select)

I’ve tried this:

vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix",
    torch_dtype=torch.float16
)
base = StableDiffusionXLPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0",
    vae=vae,
    torch_dtype=torch.float16,
    variant="fp16",
    use_safetensors=True
)
refiner = StableDiffusionXLImg2ImgPipeline.from_pretrained("stabilityai/stable-diffusion-xl-refiner-1.0",
    text_encoder_2=base.text_encoder_2,
    vae=vae,
    torch_dtype=torch.float16,
    use_safetensors=True,
    variant="fp16"
)

base.to("cuda:0")
refiner.to("cuda:1")

latents = base(
    "wow",
    num_inference_steps=20,
    output_type="latent",
).images

# latents.to("cuda:1") # have tried with and without this, both give the same error...

image = refiner(
    "wow",
    num_inference_steps=30,
    image=latents
).images[0]

The error suggests that both you base and refiner should be on the same device, but are on different devices ‘cuda:0’ and ‘cuda:1’.

Hence either move both of base and refiner to either ‘cuda:0’ or ‘cuda:1’ or use distributed training to train on both devices.

Thank you for trying to answer the question.
What I want to learn is how to run inference with both the base and refinery models by utilizing both GPUs.
I want to do model parallism just as the title states. :wink:

I’m not asking what the error means or why it happens. I’m asking how to do model parallism during inference and how to put each model on their own cuda.

Perhaps the refiner needs its own text encoder to be on cuda:1 too?

I am asking because I want to learn and have yet to find any details on how to do it.

I mean, I could set up a deconstructed custom pipeline, but I’m supposing it should be possible without defining each of the components.