Seq2seq optimizer

I’ve looked through topics about fine-tuning t5 models and saw recommendations to use AdaFactor:
" Additional training tips:

  • T5 models need a slightly higher learning rate than the default one set in the Trainer when using the AdamW optimizer. Typically, 1e-4 and 3e-4 work well for most problems (classification, summarization, translation, question answering, question generation). Note that T5 was pre-trained using the AdaFactor optimizer."

Is there a way to set other optimizers except “optimizer”: “adamw_torch”?