How to use mask tokens information in EvalPrediction for token classification tasks?

AndreaSottana · November 18, 2021, 11:32am

Hi,
I’m training a NER model (i.e. a token-level classification task) on a custom dataset, using the transformers.Trainer class. I want to compute some evaluation metrics (such as f1, precision, recall) using seqeval.classification_report.
The problem is that the Trainer takes a compute_metrics argument, which should be a callable (i.e. a function) which in turn takes as argument an EvalPrediction object. Such object only seems to have two attributes, namely predictions and label_ids, so that in the Trainer I would set compute_metrics=get_metrics, where get_metrics is something like this

def get_metrics(p: EvalPrediction):
    predictions = p.predictions
    label_ids = p.label_ids
    ## DO SOME CUSTOM PROCESSING HERE ##
    report = classification_report(y_true, y_pred, output_dict=True)
    return report

The problem I’m having with a token-level classification task, is that many sentences will be shorter than the maximum sequence length, hence there will be a lot of [PAD] tokens, and I don’t want to include predictions on [PAD] tokens into account when calculating my model’s metrics, as they are meaningless and will give a wrong (possibly worse) picture of the model’s performance. Hence I would like to include information about the mask tokens within the get_metrics function, so that I can use something like torch.masked_select function to remove labels and predictions coming from padded tokens. Is there any easy way to do this, short of giving up on the transformers.Trainer and using my own custom training loop?

Many thanks

Topic		Replies	Views
Trainer class, compute_metrics and EvalPrediction 🤗Transformers	6	14496	October 28, 2020
How do I backpropagate specific output tokens using Trainer? Intermediate	0	37	December 25, 2024
Where in the code does masking of tokens happen when pretraining BERT Beginners	5	7268	August 17, 2020
Couple of questions about Trainer Beginners	0	329	June 13, 2023
How Labelled Data is Processed \| Transformers Trainer 🤗Transformers	10	4191	April 16, 2024

How to use mask tokens information in EvalPrediction for token classification tasks?

Related topics