Difference in dimensions of T0 vs T5 models

Why is the dimension of T0 3B = 2048

whereas the dimension of T5 3B = 1024 (same as in original paper)

although T0 models should be based on corresponding T5 models?

cc @VictorSanh

Hi @cookiemonster ,
that’s a great question!

T0 is based on T5+LM which itself is based on T5 v1.1.
T5+LM is essentially T5 but with some additional steps of autoregressive language modeling.

Thanks Victor this was really helpful!

I updated the model description in case it makes things clearer for someone else