T5 Finetuning Tips

T5 questions I think I know the answer to that multiple people have asked. Correct me if I’m wrong! Quotes are from the paper.

Q: What masking objective did they use for pretraining?
Span Corruption.

Specifically, we use a mean span length of 3 and corrupt 15% of the original sequence. We found that this objective produced marginally better performance (Table 7) while being slightly more computationally efficient due to shorter target sequence lengths.

Q: Are the hf checkpoints trained with multi-tasking?
A: yes

Q:Do we have access to T5 1.1 Checkpoints:
A: No, because they are not obvious wins: Should I use t5v1.1, t5narrow and TalkingHeads? · Issue #266 · google-research/text-to-text-transfer-transformer · GitHub

4 Likes