I am actually not sure. Since I’d like to pre-train it on raw text in an unsupervised way first, then I am not sure I can use the “ForConditionalGeneration” from scratch? Not sure. Otherwise that will be an “MT5Model”
I think you can use MT5ForConditionalGeneration, you just need to create a dataset with inputs and labels first. So in order to train the model like GPT (language model objective) inputs would be the first N tokens of some text and the labels would be the rest of this text. It is not exactly like GPT, because the encoder is bidrectional.
Another way to pretrain would to use a denoising objective (e.g. like mentioned for the T5 model T5 under “Unsupervised denoising training”).