T5 Finetuning Tips

yusukemori · December 22, 2020, 3:39am

Hello,

I’m sorry for asking such a stupid question. I’m having trouble with fine-tuning on T5/mT5, and I’m hoping for your help.

I’m trying to do fine-tuning using the pre-trained t5-base, t5-large, mt5-base, etc., but it seems to generate target sentences with many extra tokens, such as <extra_id_0>, <extra_id_1>, and <extra_id_2> and more. This is especially noticeable in the case when I use t5-large.

I’m using the --fp16 option, and the dataset size is 10K<n<100K.

The training parameters are almost the same as those of Seq2SeqTrainer in transformers v3.4.0 and v4.0.0-rc-1.
I have tried both with and without prefix and have not had good results with either.

I’m not sure if it’s a matter of adjusting the parameters or pre-processing datasets, and I’m wondering where to start debugging my code.

I would be grateful for your advice.

Topic		Replies	Views
Finetuning T5 for a task Intermediate	21	6938	September 3, 2022
Finetuning T5 on translation task 🤗Transformers	0	490	September 10, 2021
Does task specific prefix matters for T5 fine-tuning? Beginners	9	7297	June 28, 2021
T5: Tips for finetuning on crossword clues (clue => answer) Models	1	629	October 14, 2020
Finetuning mT5 for specific language pair Models	0	145	October 17, 2024

T5 Finetuning Tips

Related topics