Why BertForSequenceClassification performs worse than BertModel+Linear?

jimzhou · February 24, 2022, 2:29am

In my experiments, I trained a simple sentiment classification model on the SST dataset. But it is interest that it is hardly for the model to converge with BertForSequenceClassification but could converge easily with the simple BertModel’s [CLS] +Linear. Did any one else met this problem and could explain the problem to me which part of the pool ,the tanh or the pretrained linear made this problem?

Topic		Replies	Views
Classification Heads in BERT and DistilBERT for Sequence Classification Research	2	1185	May 13, 2021
What is the classification head doing exactly? 🤗Transformers	16	24409	November 4, 2024
SST2 classification with BertForSequenceClassification 🤗Transformers	0	604	August 1, 2022
Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification Beginners	3	946	December 10, 2020
Pipeline very slow 🤗Transformers	1	4348	May 5, 2023

Why BertForSequenceClassification performs worse than BertModel+Linear?

Related topics