Convert mT5 to HF weights?


I have been attempting to convert the mT5 weights available here to the HF weights for TFT5ForConditionalGeneration or T5ForConditionalGeneration? Any ideas on how to do this?

1 Like

I have a question, can this new model be used for summarization on other languages other than English and without fine-tuning it ?

If the model is pre-trained in a multi-task learning way, then the answer is yes. However, if further fine-tuning on a specific downstream task such as summarization, that may give better performance. Sure, as opposed to the original T5, mT5 supports more than 100 languages (for example, T5Tokenizer can not tokenize Chinese).

Hi @congcongwang!

How have you been trying to do it, if I may ask?
There is a way of doing it for other models, as shown here but T5 is not among them.

Thank you!

Hi guys,

It seems mT5 employs the T5.1.1 architecture (not the original T5 arch), as you can see from the name T5-XL and T5-XXL instead of T5-3B and T5-11B.

In this case, HF still doesn’t have implementation on this T5.1.1 yet. Please see :

UPDATED (Nov 17, 2020) : will be released soon by amazing Patrick –

1 Like

Thanks for the info. They are similar but different in some places.

Improved T5 models (small to large):

and mT5 models (small to large):

are in the model hub Will upload the 3b and 11b versions in the coming days (modifié)