Loading two models onto two gpus

zzy113 · March 31, 2023, 9:57am

I am trying to train two LLMs in a GAN fashion. I have a few Nvidia GPUs at disposal. The problem is the two models cannot be loaded onto a single GPU, while I can train them individually with no problem. I guess I need to load them onto two different gpus, so I used

import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" 
os.environ["CUDA_VISIBLE_DEVICES"]="2"

after I initialized the first model. However, it still gives the following error:

Exception has occurred: RuntimeError
CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
  File "/home/zhan1130/maplecg_nfs/nanoGPT/train_gan.py", line 433, in <module>
    trainer.train()

not sure what is the right way to load them onto different gpus. Any idea?
(I am using pytorch by the way)

Topic		Replies	Views
Can't load huge model onto multiple GPU's Beginners	5	5182	June 15, 2023
CUDA out of memory on multi-GPU 🤗Transformers	1	2643	March 6, 2024
How to load large model with multiple GPU cards? Beginners	8	43413	October 25, 2023
Multi GPU Training with Trainer and TokenClassification Model 🤗Transformers	0	1517	July 21, 2023
Load a large model to multipe, specific GPUs (without CUDA_VISIBLE_DEVICES) 🤗Transformers	0	162	November 22, 2024

Loading two models onto two gpus

Related topics