Generation is called twice when using two GPUs

Hello,

I’m loading two models onto two GPUs. Everything is working fine: a generator model on "cuda:0" and a classificatior model on "cuda:1" . The issue is that for each prompt, I get two generations and two classifications, instead of only one… If someone experienced it before and knows some hints about how to solve it, please let me know

1 Like

Hi there!
It sounds like your models are running twice instead of just once per prompt—super frustrating, but let’s figure it out! Here are a few things to check:

  1. Double-check your code flow – Maybe the models are accidentally getting called twice? Try adding some print statements to see what’s happening.

  2. Look at your DataLoader (if you’re using one) – Sometimes, it can create extra copies of the data, leading to duplicate runs.

  3. Check multiprocessing settings – If your script is running in multiple processes, it might be causing this issue.

  4. Make sure the models are on the right GPUs – Just confirm that inputs and outputs are going to cuda:0 for generation and cuda:1 for classification.

  5. Look for extra loops – If your function is inside another loop or batch process, that could be making it run twice.

If you can share a bit of your code, we can dig into it together! :rocket:

1 Like