Hello,
I’m sorry for asking such a stupid question. I’m having trouble with fine-tuning on T5/mT5, and I’m hoping for your help.
I’m trying to do fine-tuning using the pre-trained t5-base
, t5-large
, mt5-base
, etc., but it seems to generate target sentences with many extra tokens, such as <extra_id_0>
, <extra_id_1>
, and <extra_id_2>
and more. This is especially noticeable in the case when I use t5-large
.
I’m using the --fp16
option, and the dataset size is 10K<n<100K
.
The training parameters are almost the same as those of Seq2SeqTrainer
in transformers v3.4.0 and v4.0.0-rc-1.
I have tried both with and without prefix and have not had good results with either.
I’m not sure if it’s a matter of adjusting the parameters or pre-processing datasets, and I’m wondering where to start debugging my code.
I would be grateful for your advice.