Imbalance memory usage on multi_gpus

Takalo · November 6, 2021, 8:32pm

Hi,

I am using the Trainer API for training a Bart model.

training_args = Seq2SeqTrainingArguments(
    output_dir='./models/bart',
    evaluation_strategy = "epoch",
    learning_rate=2e-5,
    num_train_epochs=5,           
    per_device_train_batch_size=2, 
    per_device_eval_batch_size=2,   
    warmup_steps=500,               
    weight_decay=0.01,              
    predict_with_generate=True,
)

data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)

trainer = Seq2SeqTrainer(
    model=model,                       
    args=training_args,                  
    train_dataset=train_dataset,        
    eval_dataset=val_dataset,
    data_collator=data_collator,
    tokenizer=tokenizer
)

I found out that the memory usage when training on multi-gpus is imbalance

+-----------------------------------------------------------------------------+
| Processes:                                                                  |
|  GPU   GI   CI        PID   Type   Process name                  GPU Memory |
|        ID   ID                                                   Usage      |
|=============================================================================|
|    0   N/A  N/A     14760      C   python                          10513MiB |
|    1   N/A  N/A     14760      C   python                           4811MiB |
|    2   N/A  N/A     14760      C   python                           4811MiB |
|    3   N/A  N/A     14760      C   python                           4811MiB |
|    4   N/A  N/A     14760      C   python                           4811MiB |
|    5   N/A  N/A     14760      C   python                           4811MiB |
|    6   N/A  N/A     14760      C   python                           4811MiB |
|    7   N/A  N/A     14760      C   python                           4811MiB |
+-----------------------------------------------------------------------------+

Is there a way to balance the memory usage?

ehalit · November 8, 2021, 5:18am

The reason for this, as far as I know, that all the models in the GPUs 1-7 have a copy in the GPU 0. The computed gradients on GPUs 1-7 are brought back to the GPU 0 for the backward pass to synchronize all the copies. After backpropagation, the newly obtained model parameters are distributed again to the GPUs 1-7. Forward pass is distributed, backward pass is syncronized.

So, it is necessary for a GPU to have copies of the models in other GPUs. Currently, I am not aware of a method to reduce the memory usage in the main GPU.

Takalo · November 8, 2021, 1:19pm

Thanks for your reply!

qibowang · December 28, 2023, 1:36am

Have you found a solution to the problem？

Topic		Replies	Views
Using trainer to train a bart model on 4 gpus failed 🤗Transformers	0	338	March 16, 2022
Multi GPU fintuning BART 🤗Transformers	3	1650	July 11, 2020
Imalance memory usage on multi gpus while using Trainer and how to solve it Beginners	0	156	December 27, 2023
Cuda out of memory during evaluation but training is fine 🤗Transformers	12	17255	February 20, 2025
CUDA out of memory when using Trainer with compute_metrics 🤗Transformers	25	46043	June 25, 2025

Imbalance memory usage on multi_gpus

Related topics