T5 Finetuning Tips

Hello,

I’m sorry for asking such a stupid question. I’m having trouble with fine-tuning on T5/mT5, and I’m hoping for your help.

I’m trying to do fine-tuning using the pre-trained t5-base, t5-large, mt5-base, etc., but it seems to generate target sentences with many extra tokens, such as <extra_id_0>, <extra_id_1>, and <extra_id_2> and more. This is especially noticeable in the case when I use t5-large.

I’m using the --fp16 option, and the dataset size is 10K<n<100K.

The training parameters are almost the same as those of Seq2SeqTrainer in transformers v3.4.0 and v4.0.0-rc-1.
I have tried both with and without prefix and have not had good results with either.

I’m not sure if it’s a matter of adjusting the parameters or pre-processing datasets, and I’m wondering where to start debugging my code.

I would be grateful for your advice.