T5 Finetuning Tips

yusukemori · December 23, 2020, 4:04pm

Hi,

Sorry for the frequent posts.

I tried fine-tuning T5 without --fp16 option, and the results seem to be better than when I used the option.
However, it still tends to generate longer sentences than with other Seq2SeqLMs (e.g. BART-large), and extra tokens are still generated. In particular, <extra_id_0> is generated at the beginning of the sentence.
Is this something that can be avoided by properly choosing model.config.task_specific_params or something similar?

Thank you.

Topic		Replies	Views
Issue with finetuning a seq-to-seq model 🤗Transformers	30	4056	August 11, 2022
Finetuning mT5 for specific language pair Models	0	184	October 17, 2024
T5 fp16 issue is fixed 🤗Transformers	18	15411	June 20, 2024
Training T5 on mlm task from scratch 🤗Transformers	4	3318	July 29, 2022
mT5/T5v1.1 Fine-Tuning Results Models	16	7612	March 8, 2022

T5 Finetuning Tips

Related topics