Move Trainer out of GPU

Hello! I’m using the Trainer API, which is great, to train a causal language model (gpt) using 2 GPUs in parallel.

Now, I have to train multiple similar models for a few epochs at a time, and I would like to iteratively swap out of the GPUs the current Trainer, and swap in the next one. After a while the cycle would restart, so I would need to maintain the state of all the Trainers at all time.
I would like to achieve something like:

while not done:
  for trainer in trainer_list:
    move_to_cuda(trainer)
    for epoch in range(epochs):  
      trainer.train()
    move_out_of_cuda(trainer)

I would also like to keep using both GPUs to train each model.

Does anyone know how I could move a (multi-GPU) Trainer out of vram?

edit: typos