T5 Finetuning Tips

Things I’ve found

  • task prefixes matter when
    1. When doing multi-task training
    2. When your task similar or related to one of the supervised tasks used in T5 pre-training mixture.
  • Needs slightly higher LR than the default one set in Trainer, in my experiments 1e-4 and 3e-4 worked for almost all problems (classification, QA, que-gen, summ)
  • no need to pass decoder_input_ids to T5 yourself, just pass labels and the T5Model will prepare them for you. labels should end with eos_token. (important! This is where most of the mistakes are happening).
  • T5 uses pad_token as the decoder_start_token_id so when doing generation without the generate function make sure you start it with pad token.
  • trimming batches when training on TPU leads to very slower training.
  • apparently, because of sentencepiece and some possible leakage of other languages in C4 data, T5 gives somewhat sensible results for french lang. fine-tuned it on FQuAD (french version of SQuAD) for que gen and BLUE-4 against dev set was 15.

Not sure if it’s an issue or not but in some cases using label_smoothing in T5 resulted in nan loss

10 Likes