How do I do multi Class (multi head) classification?

I want to use BERT models to do multi Class (multi head) classification. I have text and want to do a binary classification for churn and one binary classification for sentiment.

Is that possible “out of the box”? Or do I have to develop my own “BertForSequenceClassification” Class?


This is all in the loss function, so you can definitely use BertForSequenceClassification with two labels, then use the proper loss function (probably BCEWithLogitsLoss).

Well - this is connected to this question: BertForSequenceClassification only seems to have linear activation at the end - is this a bug?

Why is it only a thing of the loss function? IMO the different classification methods need different last layer activation functions. Binary Class needs sigmoid, one of multiple classes needs softmax and multiple of multi class needs sigmoid again. But somehow you always seem to have a linear (no) activation at the end. @sgugger

Isn’t this a bug?

In PyTorch, the activation is often combined with the loss for numerical stability and speed. That’s why I say it’s all in the loss.

Ahh ok. Thanks. When I want to calculate the metrics on the predictions in most cases a simple preds = pred.predictions.argmax(-1) is enough. There is no need to apply softmax or sigmoid infront of it. Only in the following cases I have to apply an “activation function” with numpy:

  • when I have multi label lassification I can not do argmax
  • when I want to compute metrics like ROC-AUC where I need “probability estimates”

@sgugger @valhalla is that right?

If you have in fact two binary classifications, wouldn’t you be better off using two binary classifiers instead of one with four labels?