How do I change the classification head of a model?

I like to change the number of labels that a trained model has. I am loading a model that was trained on 17 classes and I like adapt this model to my own task. Now if I simply change the number of labels like this:

model_checkpoint ="vblagoje/bert-english-uncased-finetuned-pos" 
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint,num_labels=2)

I get an error saying:

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
        size mismatch for classifier.weight: copying a param with shape torch.Size([17, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).
        size mismatch for classifier.bias: copying a param with shape torch.Size([17]) from checkpoint, the shape in current model is torch.Size([2]).

My question is: How do I replace the classification head?

Thanks a lot :hugs:


The reason is: you are trying to use mode, which has already pretrained on a particular classification task. You have to remove the last part ( classification head) of the model.

This is actually a kind of design fault too. In practice

( BERT base uncased + Classification ) = new Model .

is your model. Now, if you want to reuse them on a different tasks, either use BERT base uncased or extract that part from new Model.

I wanted to test if the training on the POS task results in better scores compared to using just the pain BERT base.

So my question is: How do I extract the “base” part from a trained model and add a new head?

I am not sure :slight_smile:

1 Like

Hi from what i noticed the weights your using are already fine tuned for token classification (the classifier has been trained for said task), i recommend you fine tune on the bert base case as such:

from transformers import AutoModelForTokenClassification
model = AutoModelForTokenClassification('bert-base-uncased', num_labels=2)
# Start your own training

or if you want to write your own as requested with a custom classifier head

import torch.nn as nn
from transformers import AutoModel
class PosModel(nn.Module):
    def __init__(self):
        super(PosModel, self).__init__()
        self.base_model = AutoModel.from_pretrained('bert-base-uncased')
        self.dropout = nn.Dropout(0.5)
        self.linear = nn.Linear(768, 2) # output features from bert is 768 and 2 is ur number of labels
    def forward(self, input_ids, attn_mask):
        outputs = self.base_model(input_ids, attention_mask=attn_mask)
        # You write you new head here
        outputs = self.dropout(outputs[0])
        outputs = self.linear(outputs)
        return outputs

model = PosModel()'cuda')

I am assuming your using pytorch for this.

This is a good solution if you train new a model based on a LM like bert-based-uncased from scratch. I try to replace the classification head of a model. Running your first code with a pre-trained model for token classification will result in an error message (see my sample above).
Running your second code will add a new classification layer on top of the existing one when you run it with a pos model like “vblagoje/bert-english-uncased-finetuned-pos”.

Your error basically is a mismatch of the final layer which is the classifier part:

vblagoje/bert-english-uncased-finetuned-pos - this is already finetune weights for 17 classes

bert-base-uncased - you need to use this where isn’t finetuned for anything yet, so this is what you want to use for just fine tuning the 2 classes you have

Unfortunately the author of the model did not write anything to specify what he fine tuned on so i have no idea myself but this model has already been fine tuned, therefore you cannot continue fine tuning it


Technically it should not be an issue to remove a classification head. This is the main idea of transfer learning and people do this all the time with CNNs. I like to check if this works for NLP as well.

I mean for this case here, when you load the weights it expects the linear layer to have a size of 17 but u specified 2, this is where the error comes from

size mismatch for classifier.weight: copying a param with shape torch.Size([17, 768]) from checkpoint, the shape in current model is torch.Size([2, 768]).

Well technically you can do that by deleting the last couple of layers after loading it, after then fine tuning it. again

# did not test this out
model = AutoModelForTokenClassification.from_pretrained(model_checkpoint,num_labels=17)
model.classifier = nn.Linear(768, 2)
# Run training

Well, I tried that and added two lines to set the correct number of labels:

model.classifier = nn.Linear(786,1)
model.num_labels = 2
model.config.num_labels = 2

printing the model shows that this worked.

  (dropout): Dropout(p=0.1, inplace=False)
  (classifier): Linear(in_features=786, out_features=1, bias=True)

However, running this with the trainer class results in this error:

mat1 dim 1 must match mat2 dim 0

I suspect that I missed something or broke some clever autoconfiguration. Thats why I wrote my question here.

1 Like
model.classifier = nn.Linear(786,1) # you have 1 class? I think you should change it to 2 
model.num_labels = 2 # while here you specify 2 classes so its a bit confusing

Unless you are aiming for a sigmoid function for your last layer is thats why your adding 1 class then i think you need to change to your loss function to bcewithlogitsloss


Check this. Its neatly implemented here.
And a pretty neat architecture as well.

model —> downstream_task . Well defined.

Thanks for your help! It was way too late when I wrote this :see_no_evil:

It turned out that the easiest way to solve my initial question is this:

config = AutoConfig.from_pretrained(model_checkpoint)
config.num_labels = 2
model = AutoModelForTokenClassification.from_config(config)

It works, but how this change affects the model architecture, and the results? It would be great if anyone can explain the intuition behind this.

I don’t think this solved your problem. Initialising model with ‘from_config’ only changes model configuration and it does not load model weight.

Does anyone know how to solve this problem?

1 Like

Changing the classification head scenarios suggested above does not work for my case. Instead, I suggest, which works for me, that you can change the body instead of head as follows

old_model= BertForSequenceClassification.from_pretrained("model-x") new_model=BertForSequenceClassification.from_pretrained("bert-base-uncased", num_labels=HowMany_LABELS_I_WANT) new_model.bert=old_model.bert
which works for me

1 Like

This is now possible (thanks to @sgugger) by passing in an additional argument called ignore_mismatched_sizes, which you can set to True.

If you have an already fine-tuned model with, let’s say 17 labels, and you want to replace the head with one that has 10 outputs, you can do it as follows:

from transformers import BertForTokenClassification

model_name = "vblagoje/bert-english-uncased-finetuned-pos"

model = BertForTokenClassification.from_pretrained(model_name, num_labels=10, ignore_mismatched_sizes=True)

This will print the following warning:

Some weights of BertForTokenClassification were not initialized from the model checkpoint at vblagoje/bert-english-uncased-finetuned-pos and are newly initialized because the shapes did not match:
- classifier.weight: found shape torch.Size([17, 768]) in the checkpoint and torch.Size([10, 768]) in the model instantiated
- classifier.bias: found shape torch.Size([17]) in the checkpoint and torch.Size([10]) in the model instantiated
You should probably TRAIN this model on a down-stream task to be able to use it for predictions and inference.

I guess you’re right. I’m still looking for a solution HERE