Possible mistake in Summarization:
In “Preprocessing the data” section, it says:
The tokenizers in
Transformers provide a nifty
text_targetargument that allows you to tokenize the labels in parallel to the inputs. Here is an example of how the inputs and targets are processed for mT5:
Then it provides the below code, but the code doesn’t use the “text_target” argument for tokenizing the labels. Is that a mistake?
max_input_length = 512
max_target_length = 30
def preprocess_function(examples):
model_inputs = tokenizer(
examples["review_body"],
max_length=max_input_length,
truncation=True,
)
labels = tokenizer(
examples["review_title"], max_length=max_target_length, truncation=True
)
model_inputs["labels"] = labels["input_ids"]
return model_inputs```