I implemented my custom Bert Binary Classification Model class, by adding a classifier layer on top of Bert Model (attached below). However, the accuracy/metrics are significantly different when I train with the official BertForSequenceClassification model, which makes me wonder if I am missing somehting in my class.
Few Doubts I have:
While loading the official BertForSequenceClassificationfrom_pretrained are the classifiers weight initialized as well from pretrained model or they are randomly initialized? Because in my custom class they are randomly initialized.
When we instantiate a model with from_pretrained(), the model configuration and pre-trained weights of the specified model are used to initialize the model. The library also includes a number of task-specific final layers or ‘heads’ whose weights are instantiated randomly when not present in the specified pre-trained model. For example, instantiating a model with BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2) will create a BERT model instance with encoder weights copied from the bert-base-uncased model and a randomly initialized sequence classification head on top of the encoder with an output size of 2.
It’s a good question, but I don’t know the answer, sorry.
(When I tried to add a custom head to a BERT model, I couldn’t get it to learn at all!).
How much different is the accuracy? If it’s only a bit, then it could be just random chance.
When you fine-tune, are you freezing the main BERT layers? I think by default fine-tuning will propagate back into the main layers, which might not be what you want. Not sure that would be any different with the official SequenceClassification head though.