Loading two models onto two gpus

I am trying to train two LLMs in a GAN fashion. I have a few Nvidia GPUs at disposal. The problem is the two models cannot be loaded onto a single GPU, while I can train them individually with no problem. I guess I need to load them onto two different gpus, so I used

import os
os.environ["CUDA_DEVICE_ORDER"]="PCI_BUS_ID" 
os.environ["CUDA_VISIBLE_DEVICES"]="2"

after I initialized the first model. However, it still gives the following error:

Exception has occurred: RuntimeError
CUDA error: out of memory
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1.
Compile with `TORCH_USE_CUDA_DSA` to enable device-side assertions.
  File "/home/zhan1130/maplecg_nfs/nanoGPT/train_gan.py", line 433, in <module>
    trainer.train()

not sure what is the right way to load them onto different gpus. Any idea?
(I am using pytorch by the way)