T5 Finetuning Tips

Hi,

Sorry for the frequent posts.

I tried fine-tuning T5 without --fp16 option, and the results seem to be better than when I used the option.
However, it still tends to generate longer sentences than with other Seq2SeqLMs (e.g. BART-large), and extra tokens are still generated. In particular, <extra_id_0> is generated at the beginning of the sentence.
Is this something that can be avoided by properly choosing model.config.task_specific_params or something similar?

Thank you.