Hi,
I want to use BERT models to do multi Class (multi head) classification. I have text and want to do a binary classification for churn and one binary classification for sentiment.
Is that possible “out of the box”? Or do I have to develop my own “BertForSequenceClassification” Class?
This is all in the loss function, so you can definitely use BertForSequenceClassification with two labels, then use the proper loss function (probably BCEWithLogitsLoss).
Why is it only a thing of the loss function? IMO the different classification methods need different last layer activation functions. Binary Class needs sigmoid, one of multiple classes needs softmax and multiple of multi class needs sigmoid again. But somehow you always seem to have a linear (no) activation at the end. @sgugger
Ahh ok. Thanks. When I want to calculate the metrics on the predictions in most cases a simple preds = pred.predictions.argmax(-1) is enough. There is no need to apply softmax or sigmoid infront of it. Only in the following cases I have to apply an “activation function” with numpy:
when I have multi label lassification I can not do argmax
when I want to compute metrics like ROC-AUC where I need “probability estimates”
Hello all, I need to do classification tasks with BERT but with a single BERT as the feature extractor and multiple classification heads. I guess this is similar to multi-task learning but all the tasks are classification. Can anyone help me with how to do that and if there is a notebook that I can start from?