For the Seq2SeqTrainingArguments class, what happens when I set both adafactor=True and set a learning rate?

petrosg · February 23, 2024, 8:39am

Say that I have the following Seq2SeqTrainingArguments class:

Seq2SeqTrainingArguments(
    adafactor = True,
    optim = "adafactor",
    learning_rate = 1e-4
)

In this case, I am not sure if the learning_rate is actually used anywhere. From the Seq2SeqTrainingArguments documentation:

learning_rate (float, optional, defaults to 5e-5) — The initial learning rate for AdamW optimizer.

Does this mean that it is completely ignored for Adafactor?

Thank you!

sin2piusc · August 6, 2024, 11:21pm

No it’s not ignored. Adafactor will use that as an initial “external” lr. I’ve found that Adafactor works best without a learning rate set though as it does a pretty good job of adjusting it internally.
Adafactor(model.parameters()
scale_parameter=True,
relative_step=True,
warmup_init=True,
lr=None
)
You’ll need a scheduler too.
You also need to make sure you have a warmup period so that adafactor can adjust it’s learning rate before training. 5-10%.

It’s not as fast as adamw but adafactor provides superior results with less overhead from my experience with training whisper.

Or this with a lr if you wish.

optimizer = Adafactor(
model.parameters(),
lr=1e-3,
eps=(1e-30, 1e-3),
clip_threshold=1.0,
decay_rate=-0.8,
beta1=None,
weight_decay=0.05,
relative_step=False,
scale_parameter=False,
warmup_init=False,
) # If no lr then set the last three to True.

lr_scheduler = AdafactorSchedule(optimizer) # Since Adafactor performs its own scheduling this class creates a proxy object that retrieves the current lr values from the optimizer.

Topic		Replies	Views
Seq2seq optimizer 🤗AutoTrain	0	266	February 2, 2024
Learning rate, LR scheduler and optimiser choice for fine-tuning GPT2 Beginners	1	7175	September 3, 2020
T5 training with Trainer, w/ AdaFactor 🤗Transformers	0	954	February 12, 2023
How to create custom optimizer and how does train_args' optimizer interact with custom optimizer Beginners	0	665	June 10, 2022
Seq2Seq Learning rate Intermediate	2	381	March 6, 2024

For the Seq2SeqTrainingArguments class, what happens when I set both adafactor=True and set a learning rate?

Related topics