Move Trainer out of GPU

clone · April 13, 2022, 3:34am

Hello! I’m using the Trainer API, which is great, to train a causal language model (gpt) using 2 GPUs in parallel.

Now, I have to train multiple similar models for a few epochs at a time, and I would like to iteratively swap out of the GPUs the current Trainer, and swap in the next one. After a while the cycle would restart, so I would need to maintain the state of all the Trainers at all time.
I would like to achieve something like:

while not done:
  for trainer in trainer_list:
    move_to_cuda(trainer)
    for epoch in range(epochs):  
      trainer.train()
    move_out_of_cuda(trainer)

I would also like to keep using both GPUs to train each model.

Does anyone know how I could move a (multi-GPU) Trainer out of vram?

edit: typos

Topic		Replies	Views
Training using multiple GPUs Beginners	20	20048	February 25, 2024
How to get the Trainer API to use GPU? Beginners	0	1563	May 21, 2021
Using 3 GPUs for training with Trainer() of transformers 🤗Transformers	2	2297	October 18, 2023
Finetuning GPT2 using Multiple GPU and Trainer 🤗Transformers	14	6754	May 22, 2023
Move trainer.save_pretrained("xyz") to CPU Beginners	1	602	June 26, 2023

Move Trainer out of GPU

Related topics