Pretraining an MT5 model for summarisation

antoine2323231 · September 8, 2022, 7:37am

Hello,

I can see in the hugging face course that to pretrain a gpt2 model from scratch (Training a causal language model from scratch - Hugging Face Course) we can use the following.

from transformers import AutoTokenizer, GPT2LMHeadModel, AutoConfig
config = AutoConfig.from_pretrained(
“gpt2”,
vocab_size=len(tokenizer),
n_ctx=context_length,
bos_token_id=tokenizer.bos_token_id,
eos_token_id=tokenizer.eos_token_id,
)

model = GPT2LMHeadModel(config)

But is this possible with google mt5 model? Is there a « LMHeadModel » for mt5 or other models?

Many thanks,

Antoine

thies · September 8, 2022, 7:47am

isn’t this MT5ForConditionalGeneration?

antoine2323231 · September 8, 2022, 7:54am

I am actually not sure. Since I’d like to pre-train it on raw text in an unsupervised way first, then I am not sure I can use the “ForConditionalGeneration” from scratch? Not sure. Otherwise that will be an “MT5Model”

thies · September 8, 2022, 12:01pm

I think you can use MT5ForConditionalGeneration, you just need to create a dataset with inputs and labels first. So in order to train the model like GPT (language model objective) inputs would be the first N tokens of some text and the labels would be the rest of this text. It is not exactly like GPT, because the encoder is bidrectional.

Another way to pretrain would to use a denoising objective (e.g. like mentioned for the T5 model T5 under “Unsupervised denoising training”).

Hope this helps.

Topic		Replies	Views
How to train GPT-2 for text summarization? Models	4	9567	November 24, 2024
Use Pretrained T5 for Summarization Beginners	3	636	July 2, 2021
T5 Seq2Seq custom fine-tuning Models	7	3715	November 30, 2020
Can t5 be used to text-generation? Beginners	7	8808	April 26, 2023
How is T5 pretrained? 🤗Transformers	3	510	July 12, 2021

Pretraining an MT5 model for summarisation

Related topics