Found the following statement: You don’t need to prepare a model if it is used only for inference without any kind of mixed precision in accelerate.Accelerator.prepare() documentation: Accelerator In data-parallel multi-gpu inference, we want a model copy to reside on each GPU. How can we achiev…

[image] varadhbhatnagar: In data-parallel multi-gpu inference, we want a model copy to reside on each GPU. How can we achieve that without passing the model through prepare() ? You just move the model to the device. Check out the new distributed inference tutorial, and install accelerate from…

Data Parallel Multi GPU Inference

🤗Accelerate

muellerzr September 6, 2023, 1:48pm 7

Is there a reason you want to do so instead of using device_map/big model inference? This can help narrow down my recommendation

Topic		Replies	Views
FSDP accelerate.prepare gives OOM. How to load model into single GPU, then distribute shards? 🤗Accelerate	2	1158	January 24, 2024
Inflated GPU memory footprint of model prepared via accelerate 🤗Accelerate	5	769	September 15, 2023
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9731	October 16, 2024
Tensor parallelism for customized model 🤗Accelerate	0	239	September 2, 2024
How to do distributed Inference for large models with multiprocess? 🤗Accelerate	3	643	May 26, 2024

Data Parallel Multi GPU Inference

Related topics