T5 Finetuning Tips

valhalla · August 12, 2020, 5:36am

Things I’ve found

task prefixes matter when
1. When doing multi-task training
2. When your task similar or related to one of the supervised tasks used in T5 pre-training mixture.
Needs slightly higher LR than the default one set in Trainer, in my experiments 1e-4 and 3e-4 worked for almost all problems (classification, QA, que-gen, summ)
no need to pass decoder_input_ids to T5 yourself, just pass labels and the T5Model will prepare them for you. labels should end with eos_token. (important! This is where most of the mistakes are happening).
T5 uses pad_token as the decoder_start_token_id so when doing generation without the generate function make sure you start it with pad token.
trimming batches when training on TPU leads to very slower training.
apparently, because of sentencepiece and some possible leakage of other languages in C4 data, T5 gives somewhat sensible results for french lang. fine-tuned it on FQuAD (french version of SQuAD) for que gen and BLUE-4 against dev set was 15.

Not sure if it’s an issue or not but in some cases using label_smoothing in T5 resulted in nan loss

Topic		Replies	Views
Issue with finetuning a seq-to-seq model 🤗Transformers	30	3951	August 11, 2022
Finetuning mT5 for specific language pair Models	0	147	October 17, 2024
T5 fp16 issue is fixed 🤗Transformers	18	15177	June 20, 2024
Training T5 on mlm task from scratch 🤗Transformers	4	3268	July 29, 2022
mT5/T5v1.1 Fine-Tuning Results Models	16	7481	March 8, 2022