What happens in the MT5 documentation example?

Skylixia · January 10, 2021, 10:27am

Hi,
I’m trying to understand the provided example to the MT5 model but have some difficulties.

Here is the example:
from transformers import MT5Model, T5Tokenizer
model = MT5Model.from_pretrained(“google/mt5-small”)
tokenizer = T5Tokenizer.from_pretrained(“google/mt5-small”)
article = “UN Offizier sagt, dass weiter verhandelt werden muss in Syrien.”
summary = “Weiter Verhandlung in Syrien.”
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors=“pt”)
outputs = model(input_ids=batch.input_ids, decoder_input_ids=batch.labels)
hidden_states = outputs.last_hidden_state

So I understand that tokenizer.prepare_seq2seq_batch is to encode the input to provide to the model. It is a BatchEncoding containting the input_ids, labels and attention_mask.
However, I don’t understand what follows, what happens in : model(input_ids=batch.input_ids, decoder_input_ids=batch.labels) ? This does not train or fine tune the model but what does it do ?
Why do we provide it a source and target then ? What if we wanted the model to generate the target (summary) ?

Thanks !

BramVanroy · January 10, 2021, 2:32pm

The example is just a general example of how to do a forward pass through the model, just like you can do in any model. In practice, you’d see something like this:

github.com

huggingface/transformers/blob/4f7022d68d4bae4b5e6a748b7a7323515c6fdcd3/examples/seq2seq/seq2seq_trainer.py#L162-L176


def _compute_loss(self, model, inputs, labels):
    if self.args.label_smoothing == 0:
        if self.data_args is not None and self.data_args.ignore_pad_token_for_loss:
            # force training to ignore pad token
            logits = model(**inputs, use_cache=False)[0]
            loss = self.loss_fn(logits.view(-1, logits.shape[-1]), labels.view(-1))
        else:
            # compute usual loss via models
            loss, logits = model(**inputs, labels=labels, use_cache=False)[:2]
    else:
        # compute label smoothed loss
        logits = model(**inputs, use_cache=False)[0]
        lprobs = torch.nn.functional.log_softmax(logits, dim=-1)
        loss, _ = self.loss_fn(lprobs, labels, self.args.label_smoothing, ignore_index=self.config.pad_token_id)
    return loss, logits

Jung · January 11, 2021, 2:06am

About the last question, (generating strings), note that mT5 is pretrained only on masked language model objective so that, unlike original T5, we could not play with downstream tasks without finetuning.

To test MLM, you could try the following snippets

from transformers import MT5Tokenizer, TFMT5ForConditionalGeneration
model = TFMT5ForConditionalGeneration.from_pretrained('google/mt5-base')
tokenizer = MT5Tokenizer.from_pretrained('google/mt5-base', use_fast=True)

input_ids = tokenizer.encode("What the <extra_id_0> is going on inside this <extra_id_1> ?", return_tensors="tf")

output_ids = model.generate(input_ids)
print(tokenizer.batch_decode(output_ids))

return

[’ <extra_id_0> hell <extra_id_1> strange world <extra_id_2> … ']

Skylixia · January 11, 2021, 4:04pm

Ah I see thank you !

Topic		Replies	Views
Errors when fine-tuning T5 Beginners	7	6469	January 3, 2022
Prepare data for pretraining T5 model 🤗Datasets	1	1067	May 4, 2023
Use Pretrained T5 for Summarization Beginners	3	635	July 2, 2021
Training the t5 Beginners	4	1316	August 16, 2022
Is there a way to return the "decoder_input_ids" from "tokenizer.prepare_seq2seq_batch"? 🤗Transformers	5	3344	December 29, 2020

What happens in the MT5 documentation example?

Related topics