T5 for IOB tagging requires prediction shifting

PartiallyTyped · October 31, 2022, 12:22pm

Hej everyone,

I constructed a T5 model for QA by loading a pretrained model, adjusting the LM head and decoder embeddings. I then trained the model and it works almost flawlessly, except for the fact that the predictions are shifted to the right by 1 compared to the labels. If I shift the predictions left 1 slot, the model has 1.0 precision f1 and recall. I use 3 labels, 0 for B, 1 for outside and 2 for inside.

from sklearn.metrics import classification_report
import numpy as np
labels = val_preds.label_ids.reshape(-1)
predictions = val_preds.predictions[0].argmax(-1).reshape(-1)
predictions = predictions[labels != -100]
labels = labels[labels != -100]
predictions = np.where(predictions == 2, 1, 0)
labels = np.where(labels == 2, 1, 0)
predictions = np.roll(predictions, -1)
print(classification_report(labels, predictions, target_names=["O", "I"]))

I understand that the labels become inputs to the decoder when doing teacher force and we prepend the bos token in the labels; that doesn’t explain why the predictions are wrong.

PartiallyTyped · October 31, 2022, 2:10pm

T5 uses teacher-forcing, so my hunch is that the model simply learned to predict the input as given by the teacher.

Topic		Replies	Views
T5 fine tuning, loss difference when using labels and decoder_input_ids 🤗Transformers	2	1173	October 12, 2020
T5 - model.generate() issue Beginners	2	696	March 18, 2024
Input format for T5 model in Question Answering task 🤗Transformers	0	747	February 3, 2023
GPT-2 shift logits and labels 🤗Transformers	5	5816	May 12, 2023
T5 Model Generate and Model Outputs Vastly Different Beginners	1	813	September 11, 2022

T5 for IOB tagging requires prediction shifting

Related topics