Hello,
I like to change the number of labels that a trained model has. I am loading a model that was trained on 17 classes and I like adapt this model to my own task. Now if I simply change the number of labels like this:
model_checkpoint ="vblagoje/bert-english-uncased-finetuned-pos"
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint,num_labels=2)
I get an error saying:
RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([17, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
size mismatch for classifier.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([2]).
My question is: How do I replace the classification head?
The reason is: you are trying to use mode, which has already pretrained on a particular classification task. You have to remove the last part ( classification head) of the model.
This is actually a kind of design fault too. In practice
( BERT base uncased + Classification ) = new Model .
is your model. Now, if you want to reuse them on a different tasks, either use BERT base uncased or extract that part from new Model.
Hi from what i noticed the weights your using are already fine tuned for token classification (the classifier has been trained for said task), i recommend you fine tune on the bert base case as such:
from transformers import AutoModelForTokenClassification
model = AutoModelForTokenClassification('bert-base-uncased', num_labels=2)
# Start your own training
or if you want to write your own as requested with a custom classifier head
import torch.nn as nn
from transformers import AutoModel
class PosModel(nn.Module):
def __init__(self):
super(PosModel, self).__init__()
self.base_model = AutoModel.from_pretrained('bert-base-uncased')
self.dropout = nn.Dropout(0.5)
self.linear = nn.Linear(768, 2) # output features from bert is 768 and 2 is ur number of labels
def forward(self, input_ids, attn_mask):
outputs = self.base_model(input_ids, attention_mask=attn_mask)
# You write you new head here
outputs = self.dropout(outputs[0])
outputs = self.linear(outputs)
return outputs
model = PosModel()
model.to('cuda')
This is a good solution if you train new a model based on a LM like bert-based-uncased from scratch. I try to replace the classification head of a model. Running your first code with a pre-trained model for token classification will result in an error message (see my sample above).
Running your second code will add a new classification layer on top of the existing one when you run it with a pos model like âvblagoje/bert-english-uncased-finetuned-posâ.
Your error basically is a mismatch of the final layer which is the classifier part:
vblagoje/bert-english-uncased-finetuned-pos - this is already finetune weights for 17 classes
bert-base-uncased - you need to use this where isnât finetuned for anything yet, so this is what you want to use for just fine tuning the 2 classes you have
Unfortunately the author of the model did not write anything to specify what he fine tuned on so i have no idea myself but this model has already been fine tuned, therefore you cannot continue fine tuning it
Technically it should not be an issue to remove a classification head. This is the main idea of transfer learning and people do this all the time with CNNs. I like to check if this works for NLP as well.
I mean for this case here, when you load the weights it expects the linear layer to have a size of 17 but u specified 2, this is where the error comes from
size mismatch for classifier.weight: copying a param with shape torch.Size([17, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
Well technically you can do that by deleting the last couple of layers after loading it, after then fine tuning it. again
# did not test this out
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint,num_labels=17)
model.classifier = nn.Linear(768, 2)
# Run training
model.classifier = nn.Linear(786,1) # you have 1 class? I think you should change it to 2
model.num_labels = 2 # while here you specify 2 classes so its a bit confusing
Unless you are aiming for a sigmoid function for your last layer is thats why your adding 1 class then i think you need to change to your loss function to bcewithlogitsloss
I donât think this solved your problem. Initialising model with âfrom_configâ only changes model configuration and it does not load model weight.
Changing the classification head scenarios suggested above does not work for my case. Instead, I suggest, which works for me, that you can change the body instead of head as follows
old_model= BertForSequenceClassification.from_pretrained("model-x") new_model=BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=HowMany_LABELS_I_WANT) new_model.bert=old_model.bert
which works for me
This is now possible (thanks to @sgugger) by passing in an additional argument called ignore_mismatched_sizes, which you can set to True.
If you have an already fine-tuned model with, letâs say 17 labels, and you want to replace the head with one that has 10 outputs, you can do it as follows:
from transformers import BertForTokenClassification
model_name = "vblagoje/bert-english-uncased-finetuned-pos"
model = BertForTokenClassification.from_pretrained(model_name, num_labels=10, ignore_mismatched_sizes=True)
This will print the following warning:
Some weights of BertForTokenClassification were not initialized from the model checkpoint at vblagoje/bert-english-uncased-finetuned-pos and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([17, 768]) in the checkpoint and torch.Size([10, 768]) in the model instantiated
- classifier.bias: found shape torch.Size([17]) in the checkpoint and torch.Size([10]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.