Data Parallel Multi GPU Inference

Found the following statement:
You don’t need to prepare a model if it is used only for inference without any kind of mixed precision
in
accelerate.Accelerator.prepare() documentation: Accelerator

In data-parallel multi-gpu inference, we want a model copy to reside on each GPU. How can we achieve that without passing the model through prepare() ?

You just move the model to the device. Check out the new distributed inference tutorial, and install accelerate from dev to make use of the new API if you want to do split_by_processes. Otherwise pass your dataloader to Accelerator.prepare and do model.to(state.device):

Using the DDP wrapper on your model is only relevant when you want to update the gradients (that’s what it’s designed there for), so inference just load the model on the device normally

Thanks @muellerzr for your reply. Is there any benefit in using split_between_processes() over accelerate.Accelerator().prepare() on a dataloader?

It’s useful if you don’t want to make a DataLoader, or have things that can’t go in there easily (like prompts in that example)

Understood, thanks @muellerzr !