F1 is always 0 for multi-label classification task

cli08 · September 14, 2025, 1:51pm

Hello, it’s my first time trying to fine-tune a model, and I’m running into some issues with getting a decent F1 score.

The source code of my fine tuning experiment is here: mitbforalldemo/fine_tuning/bert-fine-tuning.ipynb at main · calvinli2024/mitbforalldemo · GitHub

I’m using a HF dataset that I found here: maximuspowers/philosophy-schools-multilabel · Datasets at Hugging Face

I’ve examined both the fine-tuning code as well as the dataset, and I can’t seem to find any glaring issues with either - and yet my F1 scores are always zero when I train.

I can get a non-zero F1 for other multi-label datasets using the same code, so I doubt it’s the fine-tuning code. And I can’t see how anything can be wrong with a dataset as simple and organized as the one I linked.

Any help would be appreciated!

John6666 · September 15, 2025, 1:43am

How about like this?

import numpy as np
from sklearn.metrics import f1_score

def compute_metrics(eval_pred):
    logits, labels = eval_pred
    probs = 1 / (1 + np.exp(-logits))        # sigmoid
    preds = (probs >= 0.30).astype(np.int32) # start at 0.30; tune later
    return {
        "f1": f1_score(labels, preds, average="weighted", zero_division=0),
    }

Topic		Replies	Views
Finetuning from multiclass to mutlilabel Intermediate	4	792	September 1, 2021
Finetuning a Tensorflow model for Multilabel classification Beginners	2	906	August 25, 2023
Multiple training will give exactly the same result except for the first time 🤗Transformers	1	3575	July 19, 2021
Zero-shot classification fine-tuning Beginners	2	1212	March 18, 2022
Predicting On New Text With Fine-Tuned Multi-Label Model Beginners	4	5184	December 23, 2021

F1 is always 0 for multi-label classification task

Related topics