BertForSequenceClassification only seems to have linear activation at the end - is this a bug?

the BertForSequenceClassification class seems to have a linear activation at the end of the head.

See here:

IMO for a binary classification it should have a sigmoid function at the end and for a one of multiple classes classification there should be a softmax at the end. How does this come?


Hi @PhilipMay
This is not a bug, At L1351 the pooled output is passed through classification head (linear layer) to get the logits

CrossEntropyLoss does not require softmax , it calculates loss using logits

BertForSequenceClassification returns logits, then you can the apply softmax on the returned logits to get the class scores.