I am optimizing the finetuning of my T5 model and come across this entry on the HF website (Optimization)
where it is recommended to “use scheduled LR warm-up” with Adafactor when training T5. What is that and how can I implement the latter? Did anyone do this before?
Many thanks in advance.