Hi,
I am using the Trainer API for training a Bart model.
training_args = Seq2SeqTrainingArguments(
output_dir='./models/bart',
evaluation_strategy = "epoch",
learning_rate=2e-5,
num_train_epochs=5,
per_device_train_batch_size=2,
per_device_eval_batch_size=2,
warmup_steps=500,
weight_decay=0.01,
predict_with_generate=True,
)
data_collator = DataCollatorForSeq2Seq(tokenizer, model=model)
trainer = Seq2SeqTrainer(
model=model,
args=training_args,
train_dataset=train_dataset,
eval_dataset=val_dataset,
data_collator=data_collator,
tokenizer=tokenizer
)
I found out that the memory usage when training on multi-gpus is imbalance
+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 14760 C python 10513MiB |
| 1 N/A N/A 14760 C python 4811MiB |
| 2 N/A N/A 14760 C python 4811MiB |
| 3 N/A N/A 14760 C python 4811MiB |
| 4 N/A N/A 14760 C python 4811MiB |
| 5 N/A N/A 14760 C python 4811MiB |
| 6 N/A N/A 14760 C python 4811MiB |
| 7 N/A N/A 14760 C python 4811MiB |
+-----------------------------------------------------------------------------+
Is there a way to balance the memory usage?