I’ve looked through topics about fine-tuning t5 models and saw recommendations to use AdaFactor:
" Additional training tips:
- T5 models need a slightly higher learning rate than the default one set in the
Trainer
when using the AdamW optimizer. Typically, 1e-4 and 3e-4 work well for most problems (classification, summarization, translation, question answering, question generation). Note that T5 was pre-trained using the AdaFactor optimizer."
Is there a way to set other optimizers except “optimizer”: “adamw_torch”?