Inference on Multi-GPU/multinode

IdoAmit198 · December 21, 2022, 8:08pm

You can try to utilize accelerate.
In this link you can see how to modify a code similar to yours in order to integrate the accelerate library, which can take care of the distributed setup for you.
I didn’t work with it directly for long so I might forget the specific details. Like whether you need to pass it your nodes/GPUs or not, if you do how to, but I’m sure you can easily find all those details

By the way, I just came across this recent post, which might also come handy to your needs

Topic		Replies	Views
Multiple gpu not properly parallelized during model.generate() 🤗Transformers	4	1622	October 9, 2022
Which model for inference on 11 GB GPU? Beginners	1	394	October 30, 2021
Owl-vit batch images inference Beginners	2	1120	May 7, 2024
Loading a HF Model in Multiple GPUs and Run Inferences in those GPUs 🤗Accelerate	10	9605	October 16, 2024
Inference on multi GPUs Research	2	225	May 1, 2025

Inference on Multi-GPU/multinode

Related topics