How do I do multi Class (multi head) classification?

PhilipMay · September 15, 2020, 10:41am

Hi,
I want to use BERT models to do multi Class (multi head) classification. I have text and want to do a binary classification for churn and one binary classification for sentiment.

Is that possible “out of the box”? Or do I have to develop my own “BertForSequenceClassification” Class?

Thanks
Philip

sgugger · September 16, 2020, 1:05am

This is all in the loss function, so you can definitely use BertForSequenceClassification with two labels, then use the proper loss function (probably BCEWithLogitsLoss).

PhilipMay · September 30, 2020, 1:05pm

Well - this is connected to this question: BertForSequenceClassification only seems to have linear activation at the end - is this a bug?

Why is it only a thing of the loss function? IMO the different classification methods need different last layer activation functions. Binary Class needs sigmoid, one of multiple classes needs softmax and multiple of multi class needs sigmoid again. But somehow you always seem to have a linear (no) activation at the end. @sgugger

Isn’t this a bug?

sgugger · September 30, 2020, 1:45pm

In PyTorch, the activation is often combined with the loss for numerical stability and speed. That’s why I say it’s all in the loss.

PhilipMay · October 1, 2020, 8:24am

Ahh ok. Thanks. When I want to calculate the metrics on the predictions in most cases a simple preds = pred.predictions.argmax(-1) is enough. There is no need to apply softmax or sigmoid infront of it. Only in the following cases I have to apply an “activation function” with numpy:

when I have multi label lassification I can not do argmax
when I want to compute metrics like ROC-AUC where I need “probability estimates”

@sgugger @valhalla is that right?

cgawron · October 1, 2020, 9:01am

If you have in fact two binary classifications, wouldn’t you be better off using two binary classifiers instead of one with four labels?

SaraAmd · October 18, 2022, 7:11pm

Hello all, I need to do classification tasks with BERT but with a single BERT as the feature extractor and multiple classification heads. I guess this is similar to multi-task learning but all the tasks are classification. Can anyone help me with how to do that and if there is a notebook that I can start from?

Topic		Replies	Views
BertForSequenceClassification only seems to have linear activation at the end - is this a bug? 🤗Transformers	1	2893	September 30, 2020
What is the classification head doing exactly? 🤗Transformers	16	24437	November 4, 2024
Multiclass vs Multilabel Beginners	1	2616	August 11, 2020
How to use Auto Model For SequenceClassification for Multi-Class Text Classification? 🤗AutoTrain	1	3735	February 26, 2023
Which loss function in bertforsequenceclassification regression Beginners	7	15541	February 25, 2021

How do I do multi Class (multi head) classification?

Related topics