Learning rate for the `Trainer` in a multi gpu setup

marouen · April 29, 2024, 2:20pm

Im training using the trainer class on a multi gpu setup.
I know that when using accelerate (Comparing performance between different device setups), in order to train with the desired learning rate we have to explicitely multiply by the number of gpus.

Is that also the case when using the trainer class?
In the case of warmup steps: should the same be applied? i.e. n_warmup_steps *= n_gpus ?
In the case of a learning rate scheduler: should the same be applied too?

muellerzr · April 29, 2024, 4:21pm

Yes.

Wrt warmup steps, it may need to be, Im’ unsure off the top of my head

marouen · April 29, 2024, 4:41pm

Thanks for answering, so if I pass some lr to either TrainingArguments learning_rate or to the Trainer optimizers, backprop actually occurs with lr / n_gpus. Is my understanding correct?
In that case, wouldnt it be less prone to confusion to call it (similarly to the batch size), learning_rate_per_device?

muellerzr · April 29, 2024, 5:32pm

Not necessarily, because it’s a huristic that people recommend to do so, but it’s also recommended to test yourself at your discretion.

What’s really happening is the number of steps increases that we’re stepping the learning rate, so if you want the same LR from situation A to B you should try multiplying the learning rate.

However again: test yourself first. Sometimes it’s not necessary

system · April 30, 2024, 8:44am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Learning Rate Scheduler Distributed Training 🤗Accelerate	6	2464	September 5, 2024
How to set 'num_training_steps' for the learning rate scheduler? Beginners	0	530	November 23, 2023
Multi-gpu training does not optimize as expected Beginners	1	462	February 26, 2024
Same number of optimizations steps with 1 GPU or 4 GPUs? 🤗Accelerate	0	336	March 11, 2023
Learning rate setting 🤗Transformers	1	2082	November 16, 2020

Learning rate for the `Trainer` in a multi gpu setup

Related topics