Calculate precision, recall, f1 score for custom dataset for multiclass classification

I am trying to do multiclass classification for the sentence pair task. I uploaded my custom dataset of train and test separately in the hugging face data set and trained my model and tested it and was trying to see the f1 score and accuracy.

I tried

metric = load_metric("glue", "mrpc")

metric.add_batch(predictions=predictions, references=refernces)

but it says

ValueError: Target is multiclass but average='binary'. Please choose another average setting, one of [None, 'micro', 'macro', 'weighted'].

How can I fix this and print precision, recall, and f1 score?

@sgugger any help on this?

I didn’t try on my local but I think you can pass average in **kwargs, maybe if you could do:

metric.add_batch(predictions=predictions, references=references, average="micro")

should work. The binary average works for, as said, binary class problems.

@merve I tried it but doesn’t work

Okay I realized what was wrong.
So MRPC itself is a binary classification task, so your dataset has to have binary target. You’re loading MRPC as metric yet it says your original dataset is multiclass. Is it like that?

Apparently you can’t change the average argument for a good reason.

@merve Do you have any idea which metric should I use for multiclass classification if I want to have all the results of precision, recall, f1, and accuracy.

did you able to solve the issue? both are not working

Hey, you can use the following:

from datasets import load_metric

precision = precision_metric.compute(predictions=y_pred, references=y_test,average="weighted")["precision"]

You can do the same for precision and recall too. If you want another measure like micro or macro change, the value of average

1 Like

how to generate y_pred here, I try to do it but it’s not working.

y_pred is the prediction of your model