Replacing last layer of a fine-tuned model to use different set of labels

Aliseyfi · December 21, 2021, 4:19am

I’m trying to fine-tune dslim/bert-base-NER using the wnut_17 dataset.
Since the number of NER labels is different, I manually replaced these parameters in the model to get rid of the size mismatch error :

model.config.id2label = my_id2label
model.config.label2id = my_label2id
model.config._num_labels = len(my_id2label) ## replacing 9 by 13

However, when training starts I get the following error which I don’t know how to handle:

Expected input batch_size (1456) to match target batch_size (1008).

Has anyone handled this manually?
@sgugger @phosseini Won’t it be great if we can have a solid function that handles the head replacements for fine-tuning.

Shapes:

tokenized_wnut[‘train’].shape = (3394, 7)
tokenized_wnut[‘validation’].shape = (1009, 7)

Model config after “manual” modifications:

BertConfig {
  "_name_or_path": "dslim/bert-base-NER",
  "_num_labels": 13,
  "architectures": [
    "BertForTokenClassification"
  ],
  "attention_probs_dropout_prob": 0.1,
  "classifier_dropout": null,
  "hidden_act": "gelu",
  "hidden_dropout_prob": 0.1,
  "hidden_size": 768,
  "id2label": {
    "0": "O",
    "1": "B-corporation",
    "2": "I-corporation",
    "3": "B-creative-work",
    "4": "I-creative-work",
    "5": "B-group",
    "6": "I-group",
    "7": "B-location",
    "8": "I-location",
    "9": "B-person",
    "10": "I-person",
    "11": "B-product",
    "12": "I-product"
  },
  "initializer_range": 0.02,
  "intermediate_size": 3072,
  "label2id": {
    "B-corporation": 1,
    "B-creative-work": 3,
    "B-group": 5,
    "B-location": 7,
    "B-person": 9,
    "B-product": 11,
    "I-corporation": 2,
    "I-creative-work": 4,
    "I-group": 6,
    "I-location": 8,
    "I-person": 10,
    "I-product": 12,
    "O": 0
  },
  "layer_norm_eps": 1e-12,
  "max_position_embeddings": 512,
  "model_type": "bert",
  "num_attention_heads": 12,
  "num_hidden_layers": 12,
  "output_past": true,
  "pad_token_id": 0,
  "position_embedding_type": "absolute",
  "transformers_version": "4.14.1",
  "type_vocab_size": 2,
  "use_cache": true,
  "vocab_size": 28996
}

nielsr · December 21, 2021, 1:55pm

Hi,

See this post that will solve your issue

Aliseyfi · December 21, 2021, 7:15pm

Thanks, but apparently it’s not completely resolved. The solutions suggested don’t work for my problem.

nielsr · December 22, 2021, 9:39am

it’s not completely resolved.

=> can you clarify? The error that you’re getting:

Expected input batch_size (1456) to match target batch_size (1008).

suggests that it has to do with the preparation of the dataset, rather than the last layer of the model.

Aliseyfi · December 23, 2021, 12:57am

Thank you @nielsr for being responsive.
That error is resolved now, but the question is "does simply changing the number of labels mean that we have changed the classifier head?!"

By the way, for my problem, I had to do these modifications:

model_name = "dslim/bert-base-NER"
mymodel = AutoModelForTokenClassification.from_pretrained(model_name, num_labels=len(my_id2label), ignore_mismatched_sizes=True)
...
mymodel.config.id2label = my_id2label
mymodel.config.label2id = my_label2id
mymodel.config._num_labels = len(my_id2label) ## replacing 9 by 13
mymodel.config.num_labels = len(my_id2label)

nielsr · December 23, 2021, 9:43am

=> yes. If you change the number of output neurons, then you’ll get a new linear layer whose weights and bias are randomly initialized.

Also, not sure why you have _num_labels, you should only have num_labels.

Aliseyfi · December 23, 2021, 4:59pm

Thanks for double confirming @nielsr.

_num_labels is the 2nd parameter in Bert Config (shown in the model config above), and yes, changing it is optional.

Topic		Replies	Views
Replacing last layer of a fine-tuned model for using different set of labels Beginners	0	376	December 18, 2021
Retrain/reuse fine-tuned models on different set of labels Beginners	7	4922	April 8, 2021
Overall accuracy in Finetuning dslim/bert-base-NER with custom dataset and labels gets only up to ~0.15 using seqeval 🤗Transformers	2	511	May 1, 2023
How do I change the classification head of a model? 🤗Transformers	31	52914	November 14, 2024
Adjusting parameters for the FC layers at the end 🤗Transformers	1	1875	July 20, 2021

Replacing last layer of a fine-tuned model to use different set of labels

Related topics