I like to change the number of labels that a trained model has. I am loading a model that was trained on 17 classes and I like adapt this model to my own task. Now if I simply change the number of labels like this:
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint,num_labels=2)
I get an error saying:
RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([17, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
size mismatch for classifier.bias: copying a param with shape torch.Size() from checkpoint, the shape in current model is torch.Size().
My question is: How do I replace the classification head?
Hi from what i noticed the weights your using are already fine tuned for token classification (the classifier has been trained for said task), i recommend you fine tune on the bert base case as such:
from transformers import AutoModelForTokenClassification
model = AutoModelForTokenClassification('bert-base-uncased', num_labels=2)
# Start your own training
or if you want to write your own as requested with a custom classifier head
import torch.nn as nn
from transformers import AutoModel
self.base_model = AutoModel.from_pretrained('bert-base-uncased')
self.dropout = nn.Dropout(0.5)
self.linear = nn.Linear(768, 2) # output features from bert is 768 and 2 is ur number of labels
def forward(self, input_ids, attn_mask):
outputs = self.base_model(input_ids, attention_mask=attn_mask)
# You write you new head here
outputs = self.dropout(outputs)
outputs = self.linear(outputs)
model = PosModel()
This is a good solution if you train new a model based on a LM like bert-based-uncased from scratch. I try to replace the classification head of a model. Running your first code with a pre-trained model for token classification will result in an error message (see my sample above).
Running your second code will add a new classification layer on top of the existing one when you run it with a pos model like “vblagoje/bert-english-uncased-finetuned-pos”.
Unfortunately the author of the model did not write anything to specify what he fine tuned on so i have no idea myself but this model has already been fine tuned, therefore you cannot continue fine tuning it
Technically it should not be an issue to remove a classification head. This is the main idea of transfer learning and people do this all the time with CNNs. I like to check if this works for NLP as well.
Changing the classification head scenarios suggested above does not work for my case. Instead, I suggest, which works for me, that you can change the body instead of head as follows
old_model= BertForSequenceClassification.from_pretrained("model-x") new_model=BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=HowMany_LABELS_I_WANT) new_model.bert=old_model.bert
which works for me
This is now possible (thanks to @sgugger) by passing in an additional argument called ignore_mismatched_sizes, which you can set to True.
If you have an already fine-tuned model with, let’s say 17 labels, and you want to replace the head with one that has 10 outputs, you can do it as follows:
from transformers import BertForTokenClassification
model_name = "vblagoje/bert-english-uncased-finetuned-pos"
model = BertForTokenClassification.from_pretrained(model_name, num_labels=10, ignore_mismatched_sizes=True)
This will print the following warning:
Some weights of BertForTokenClassification were not initialized from the model checkpoint at vblagoje/bert-english-uncased-finetuned-pos and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([17, 768]) in the checkpoint and torch.Size([10, 768]) in the model instantiated
- classifier.bias: found shape torch.Size() in the checkpoint and torch.Size() in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.