I got a specific classification task where I finetuned a pretrained BERT Model on a specific task concerning customer reviews (classify a text as “customer service text”, “user experience” etc.):
As you can see I got 8 distinct classes. My finetuned classification model scores pretty well with unseen data with a f1 score of 80.1. However, it is possible that one text belongs to 2 different classes. My question now is how I have to change my code achieve that? I already transformed my target variable with MultiLabelBinarizer such that my target variable looks like this:
You can set the problem_type of an xxxForSequenceClassification model to multi_label_classification when instantiating it, like so:
from transformers import AutoModelForSequenceClassification
model = AutoModelForSequenceClassification.from_pretrained("bert-base-german-dbmz-uncased", problem_type="multi_label_classification", num_labels=num_labels_cla)
This ensures that the BCEWithLogitsLoss is used instead of the CrossEntropyLoss, which is necessary for multi-label classification. You can then fine-tune just like you would do with multi-class classification.
The logits will be of shape (batch_size, num_labels). The docs of PyTorch’s BCEWithLogitsLoss indicates that the labels should have the same shape. So that’s indeed correct.