Seq2seq optimizer

FoodIntake · February 2, 2024, 12:50pm

I’ve looked through topics about fine-tuning t5 models and saw recommendations to use AdaFactor:
" Additional training tips:

T5 models need a slightly higher learning rate than the default one set in the Trainer when using the AdamW optimizer. Typically, 1e-4 and 3e-4 work well for most problems (classification, summarization, translation, question answering, question generation). Note that T5 was pre-trained using the AdaFactor optimizer."

Is there a way to set other optimizers except “optimizer”: “adamw_torch”?

Topic		Replies	Views
For the Seq2SeqTrainingArguments class, what happens when I set both adafactor=True and set a learning rate? 🤗Transformers	1	379	August 6, 2024
T5 Finetuning Tips Models	48	56625	November 3, 2024
T5 training with Trainer, w/ AdaFactor 🤗Transformers	0	955	February 12, 2023
Learning rate and Data size 🤗Transformers	1	608	April 2, 2022
Fine-tuning T5 for sentiment classification Beginners	3	3619	December 22, 2023