Replacing last layer of a fine-tuned model to use different set of labels

I’m trying to fine-tune dslim/bert-base-NER using the wnut_17 dataset.
Since the number of NER labels is different, I manually replaced these parameters in the model to get rid of the size mismatch error :

model.config.id2label = my_id2label
model.config.label2id = my_label2id
model.config._num_labels = len(my_id2label) ## replacing 9 by 13

However, when training starts I get the following error which I don’t know how to handle:

Expected input batch_size (1456) to match target batch_size (1008).

Has anyone handled this manually?
@sgugger @phosseini Won’t it be great if we can have a solid function that handles the head replacements for fine-tuning.


tokenized_wnut[‘train’].shape = (3394, 7)
tokenized_wnut[‘validation’].shape = (1009, 7)

Model config after “manual” modifications:

BertConfig {
  "_name_or_path": "dslim/bert-base-NER",
  "_num_labels": 13,
  "architectures": [
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "O",
    "1": "B-corporation",
    "2": "I-corporation",
    "3": "B-creative-work",
    "4": "I-creative-work",
    "5": "B-group",
    "6": "I-group",
    "7": "B-location",
    "8": "I-location",
    "9": "B-person",
    "10": "I-person",
    "11": "B-product",
    "12": "I-product"
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "B-corporation": 1,
    "B-creative-work": 3,
    "B-group": 5,
    "B-location": 7,
    "B-person": 9,
    "B-product": 11,
    "I-corporation": 2,
    "I-creative-work": 4,
    "I-group": 6,
    "I-location": 8,
    "I-person": 10,
    "I-product": 12,
    "O": 0
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.14.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
1 Like


See this post that will solve your issue :slight_smile:

Thanks, but apparently it’s not completely resolved. The solutions suggested don’t work for my problem.

it’s not completely resolved.

=> can you clarify? The error that you’re getting:

Expected input batch_size (1456) to match target batch_size (1008).

suggests that it has to do with the preparation of the dataset, rather than the last layer of the model.

Thank you @nielsr for being responsive.
That error is resolved now, but the question is "does simply changing the number of labels mean that we have changed the classifier head?!"

By the way, for my problem, I had to do these modifications:

model_name = "dslim/bert-base-NER"
mymodel = AutoModelForTokenClassification.from_pretrained(model_name, num_labels=len(my_id2label), ignore_mismatched_sizes=True)
mymodel.config.id2label = my_id2label
mymodel.config.label2id = my_label2id
mymodel.config._num_labels = len(my_id2label) ## replacing 9 by 13
mymodel.config.num_labels = len(my_id2label)

=> yes. If you change the number of output neurons, then you’ll get a new linear layer whose weights and bias are randomly initialized.

Also, not sure why you have _num_labels, you should only have num_labels.

1 Like

Thanks for double confirming @nielsr.

_num_labels is the 2nd parameter in Bert Config (shown in the model config above), and yes, changing it is optional.