What happens in the MT5 documentation example?

I’m trying to understand the provided example to the MT5 model but have some difficulties.

Here is the example:
from transformers import MT5Model, T5Tokenizer
model = MT5Model.from_pretrained(“google/mt5-small”)
tokenizer = T5Tokenizer.from_pretrained(“google/mt5-small”)
article = “UN Offizier sagt, dass weiter verhandelt werden muss in Syrien.”
summary = “Weiter Verhandlung in Syrien.”
batch = tokenizer.prepare_seq2seq_batch(src_texts=[article], tgt_texts=[summary], return_tensors=“pt”)
outputs = model(input_ids=batch.input_ids, decoder_input_ids=batch.labels)
hidden_states = outputs.last_hidden_state

So I understand that tokenizer.prepare_seq2seq_batch is to encode the input to provide to the model. It is a BatchEncoding containting the input_ids, labels and attention_mask.
However, I don’t understand what follows, what happens in : model(input_ids=batch.input_ids, decoder_input_ids=batch.labels) ? This does not train or fine tune the model but what does it do ?
Why do we provide it a source and target then ? What if we wanted the model to generate the target (summary) ?

Thanks !

The example is just a general example of how to do a forward pass through the model, just like you can do in any model. In practice, you’d see something like this:

1 Like

About the last question, (generating strings), note that mT5 is pretrained only on masked language model objective so that, unlike original T5, we could not play with downstream tasks without finetuning.

To test MLM, you could try the following snippets

from transformers import MT5Tokenizer, TFMT5ForConditionalGeneration
model = TFMT5ForConditionalGeneration.from_pretrained('google/mt5-base')
tokenizer = MT5Tokenizer.from_pretrained('google/mt5-base', use_fast=True)

input_ids = tokenizer.encode("What the <extra_id_0> is going on inside this <extra_id_1> ?", return_tensors="tf")

output_ids = model.generate(input_ids)


[’ <extra_id_0> hell <extra_id_1> strange world <extra_id_2> … ']


Ah I see thank you !