T5 Finetuning Tips

sshleifer · August 19, 2020, 4:20am

T5 questions I think I know the answer to that multiple people have asked. Correct me if I’m wrong! Quotes are from the paper.

Q: What masking objective did they use for pretraining?
Span Corruption.

Specifically, we use a mean span length of 3 and corrupt 15% of the original sequence. We found that this objective produced marginally better performance (Table 7) while being slightly more computationally efficient due to shorter target sequence lengths.

Q: Are the hf checkpoints trained with multi-tasking?
A: yes

Q:Do we have access to T5 1.1 Checkpoints:
A: No, because they are not obvious wins: Should I use t5v1.1, t5narrow and TalkingHeads? · Issue #266 · google-research/text-to-text-transfer-transformer · GitHub

Topic		Replies	Views
Issue with finetuning a seq-to-seq model 🤗Transformers	30	3966	August 11, 2022
mT5/T5v1.1 Fine-Tuning Results Models	16	7513	March 8, 2022
Training T5 on mlm task from scratch 🤗Transformers	4	3284	July 29, 2022
How to fine-tune T5-base model? Beginners	10	4600	July 28, 2021
HF Trainer: HF trainer cause a problem while fine-tuning T5 (T5 doesn't generate eos token at proper point) 🤗Transformers	0	826	March 6, 2022

T5 Finetuning Tips

Related topics