Chapter 7 questions

Possible mistake in Summarization:
In “Preprocessing the data” section, it says:

The tokenizers in :hugs: Transformers provide a nifty text_target argument that allows you to tokenize the labels in parallel to the inputs. Here is an example of how the inputs and targets are processed for mT5:

Then it provides the below code, but the code doesn’t use the “text_target” argument for tokenizing the labels. Is that a mistake?

max_input_length = 512
max_target_length = 30


def preprocess_function(examples):
    model_inputs = tokenizer(
        examples["review_body"],
        max_length=max_input_length,
        truncation=True,
    )
    labels = tokenizer(
        examples["review_title"], max_length=max_target_length, truncation=True
    )
    model_inputs["labels"] = labels["input_ids"]
    return model_inputs```
1 Like