How to generate on multiple GPU's

Samuel-Fipps · August 17, 2022, 8:40pm

The last time this question was asked it was in 2020, and I can’t seem to how out how to do this. Is it possible?

joaogante · August 19, 2022, 2:44pm

The bottleneck of generation is the model forward pass, so being able to run the model forward pass in multiple GPUs should do it.

I’m not knowledgeable about multi-GPU inference, especially in PyTorch, maybe @sgugger knows how to do it

(the answer will also be useful to me )

marshmellow77 · August 21, 2022, 9:29pm

Check out FastTransformer, a library focusing on large models, spanning many GPUs and nodes in a distributed manner.

MikeyBeez · August 30, 2022, 4:48am

I think you can use a pip module called accelerate. I believe this module can also train on multiple machines. I don’t know if they need to be heterogeneous. I use it to switch from CPU to MPS on my Mac. It has options for multi-gpu.

Topic		Replies	Views
Data Parallelism for multi-GPUs Inference Intermediate	0	548	October 26, 2022
Multiple gpu not properly parallelized during model.generate() 🤗Transformers	4	1621	October 9, 2022
When I try to inference on multiple GPUs using multiple processes, the time for model. generate() becomes very long 🤗Transformers	0	474	June 12, 2023
How to generate with a single gpu when a model is loaded onto multiple gpus? Beginners	0	882	February 9, 2024
Model.generate() is extremely slow while using beam search 🤗Transformers	2	5387	July 24, 2022

How to generate on multiple GPU's

Related topics