Hi,
the BertForSequenceClassification
class seems to have a linear activation at the end of the head.
See here: https://github.com/huggingface/transformers/blob/3323146e904a1092def4e8527de9d2a7479c1c14/src/transformers/modeling_bert.py#L1351
IMO for a binary classification it should have a sigmoid function at the end and for a one of multiple classes classification there should be a softmax at the end. How does this come?
Thanks
Philip
Hi @PhilipMay
This is not a bug, At L1351 the pooled output is passed through classification head (linear layer) to get the logits
CrossEntropyLoss
does not require softmax
, it calculates loss using logits
BertForSequenceClassification
returns logits
, then you can the apply softmax
on the returned logits
to get the class scores.