Pretraining an MT5 model for summarisation


I can see in the hugging face course that to pretrain a gpt2 model from scratch (Training a causal language model from scratch - Hugging Face Course) we can use the following.

from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig
config = AutoConfig.from_pretrained(

model = GPT2LMHeadModel(config)

But is this possible with google mt5 model? Is there a « LMHeadModel » for mt5 or other models?

Many thanks,


isn’t this MT5ForConditionalGeneration?

I am actually not sure. Since I’d like to pre-train it on raw text in an unsupervised way first, then I am not sure I can use the “ForConditionalGeneration” from scratch? Not sure. Otherwise that will be an “MT5Model”

I think you can use MT5ForConditionalGeneration, you just need to create a dataset with inputs and labels first. So in order to train the model like GPT (language model objective) inputs would be the first N tokens of some text and the labels would be the rest of this text. It is not exactly like GPT, because the encoder is bidrectional.

Another way to pretrain would to use a denoising objective (e.g. like mentioned for the T5 model T5 under “Unsupervised denoising training”).

Hope this helps.