Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification

adrshkm · December 10, 2020, 9:30pm

Hi All,

I implemented my custom Bert Binary Classification Model class, by adding a classifier layer on top of Bert Model (attached below). However, the accuracy/metrics are significantly different when I train with the official BertForSequenceClassification model, which makes me wonder if I am missing somehting in my class.

Few Doubts I have:

While loading the official BertForSequenceClassification from_pretrained are the classifiers weight initialized as well from pretrained model or they are randomly initialized? Because in my custom class they are randomly initialized.

class MyCustomBertClassification(nn.Module):
    def __init__(self, encoder='bert-base-uncased',
                       num_labels,
                       hidden_dropout_prob):

    super(MyCustomBertClassification, self).__init__()
    self.config  = AutoConfig.from_pretrained(encoder)
    self.encoder = AutoModel.from_pretrained(self.config)
    self.dropout = nn.Dropout(hidden_dropout_prob)
    self.classifier = nn.Linear(self.config.hidden_size, num_labels)

def forward(self, input_sent):
    outputs = self.encoder(input_ids=input_sent['input_ids'],
                         attention_mask=input_sent['attention_mask'],
                         token_type_ids=input_sent['token_type_ids'],
                         return_dict=True)
    
    pooled_output = self.dropout(outputs[1])
    # for both tasks
    logits = self.classifier(pooled_output)

    return logits

`

rgwatwormhill · December 10, 2020, 11:32pm

Hi adrshkm,

the weights of the SequenceClassification head are initialized randomly.

See this page Fine-tune a pretrained model

which says

When we instantiate a model with from_pretrained() , the model configuration and pre-trained weights of the specified model are used to initialize the model. The library also includes a number of task-specific final layers or ‘heads’ whose weights are instantiated randomly when not present in the specified pre-trained model. For example, instantiating a model with BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) will create a BERT model instance with encoder weights copied from the bert-base-uncased model and a randomly initialized sequence classification head on top of the encoder with an output size of 2.

adrshkm · December 10, 2020, 11:35pm

This makes sense. In that case why is my custom BERT Classification Model’s accuracy lower than the official BertForSequnceClassification?

rgwatwormhill · December 10, 2020, 11:44pm

It’s a good question, but I don’t know the answer, sorry.

(When I tried to add a custom head to a BERT model, I couldn’t get it to learn at all!).

How much different is the accuracy? If it’s only a bit, then it could be just random chance.

When you fine-tune, are you freezing the main BERT layers? I think by default fine-tuning will propagate back into the main layers, which might not be what you want. Not sure that would be any different with the official SequenceClassification head though.

Have you looked at the code that is used for the official SequenceClassification head? This post Which loss function in bertforsequenceclassification regression includes a link to the GitHub page for the code.

Topic		Replies	Views
Weights not downloading Beginners	3	1844	May 24, 2021
Trying to understand XForSequenceClassification heads Intermediate	8	1323	September 24, 2020
How do i take only "BERT" weights from BertForSequenceClassification model? 🤗Transformers	0	1445	February 16, 2022
Further Pretrain Basic BERT for sequence classification 🤗Transformers	4	1810	October 9, 2020
Fine-Tune BERT with two Classification Heads "next to each other"? Beginners	3	2690	September 17, 2021

Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification

Related topics