Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification

It’s a good question, but I don’t know the answer, sorry.

(When I tried to add a custom head to a BERT model, I couldn’t get it to learn at all!).

How much different is the accuracy? If it’s only a bit, then it could be just random chance.

When you fine-tune, are you freezing the main BERT layers? I think by default fine-tuning will propagate back into the main layers, which might not be what you want. Not sure that would be any different with the official SequenceClassification head though.

Have you looked at the code that is used for the official SequenceClassification head? This post Which loss function in bertforsequenceclassification regression includes a link to the GitHub page for the code.