Generation is called twice when using two GPUs

tiago-machado · February 3, 2025, 12:44pm

Hello,

I’m loading two models onto two GPUs. Everything is working fine: a generator model on "cuda:0" and a classificatior model on "cuda:1" . The issue is that for each prompt, I get two generations and two classifications, instead of only one… If someone experienced it before and knows some hints about how to solve it, please let me know

Alanturner2 · February 3, 2025, 1:09pm

Hi there!
It sounds like your models are running twice instead of just once per prompt—super frustrating, but let’s figure it out! Here are a few things to check:

Double-check your code flow – Maybe the models are accidentally getting called twice? Try adding some print statements to see what’s happening.
Look at your DataLoader (if you’re using one) – Sometimes, it can create extra copies of the data, leading to duplicate runs.
Check multiprocessing settings – If your script is running in multiple processes, it might be causing this issue.
Make sure the models are on the right GPUs – Just confirm that inputs and outputs are going to cuda:0 for generation and cuda:1 for classification.
Look for extra loops – If your function is inside another loop or batch process, that could be making it run twice.

If you can share a bit of your code, we can dig into it together!

Topic		Replies	Views
Problem with multiple GPUs Beginners	0	100	December 13, 2024
Model.generate() is extremely slow while using beam search 🤗Transformers	2	5382	July 24, 2022
When I try to inference on multiple GPUs using multiple processes, the time for model. generate() becomes very long 🤗Transformers	0	474	June 12, 2023
Model.generate() OOM on 1 of 2 GPUs? 🤗Transformers	4	1683	March 4, 2022
How to generate with a single gpu when a model is loaded onto multiple gpus? Beginners	0	882	February 9, 2024

Generation is called twice when using two GPUs

Related topics