T5 questions I think I know the answer to that multiple people have asked. Correct me if I’m wrong! Quotes are from the paper.
Q: What masking objective did they use for pretraining?
Span Corruption.
Specifically, we use a mean span length of 3 and corrupt 15% of the original sequence. We found that this objective produced marginally better performance (Table 7) while being slightly more computationally efficient due to shorter target sequence lengths.
Q: Are the hf checkpoints trained with multi-tasking?
A: yes
Q:Do we have access to T5 1.1 Checkpoints:
A: No, because they are not obvious wins: Should I use t5v1.1, t5narrow and TalkingHeads? · Issue #266 · google-research/text-to-text-transfer-transformer · GitHub