Retrain/reuse fine-tuned models on different set of labels

I am wondering is it possible to reuse or retrain a fine-tuned model with a new set of labels(the set of labels contain new labels or the new set of labels is a subset of the labels used to fine-tune the model)?
What I try to do is fine-tune pre-trained models for a task (e.g. NER) in the domain free dataset, then reuse/retrain this fine-tuned model to do a similar task but in a more specific domain (e.g. NER for healthcare), thus in this specific domain, the set of labels may not be the same.
I already try to fine-tune a BERT model to do NER on WNUT17 data based on token classification example in Transformers GitHub. After that, I try to retrain the fine-tuned model by adding a new label and provide train data that has this label, the train failed with error

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([13, 1024]) from checkpoint, the shape in current model is torch.Size([15, 1024]).
size mismatch for classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).

Is it possible to do this with Transformers and if so how? Thank you in advance!

AFAIK now it is not possible to use the fine-tuned model to be retrained on a new set of labels. A workaround for this is to fine-tune a pre-trained model use whole (old + new) data with a superset of the old + new labels. Is true?

I know it’s more of an ML question than a specific question toward this package, but I will really appreciate it if anyone can refer me to some reference that explains this. Thank you in advance.

In general, when loading a pretrained model, the library expects tensor shapes to match those of the pretrained model you want to load. This is why you are getting your error, and I don’t think there is a quick fix to this.
I think you need to manually write some loading method in this case, to load everything but the last layer.

1 Like

Thank you for your reply.
If you don’t mind can you explain more about how to manually write some loading method and use it to continue the train with different set of labels?
Also, what is the pros and cons of doing it instead of fine-tuning a pre-trained model use whole (old + new) data with a superset of the old + new labels?

So I try to do some modification into
I try to do what you said but I don’t know if I am doing it right or not. So what I am trying to do is to load the model as it is using from_pretrained and then change the last layer if the labels are not the same as the defined labels.
Can you please review it and told me if I am doing it right? Thank you in advance!

    config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        label2id={label: i for i, label in enumerate(labels)},
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,

    with training_args.strategy.scope():
        model = TFAutoModelForTokenClassification.from_pretrained(
            from_pt=bool(".bin" in model_args.model_name_or_path),

        if model.config.num_labels != config.num_labels or model.config.id2label != config.id2label:
            model.classifier = tf.keras.layers.Dense(
                config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
        model.config = config

I’m on an expert on the TF side, but this looks good to me.

1 Like

Thank you very much @sgugger

@kevinyauris, did you get your model working with new labels?
The github repository link you referenced no longer works so if you have some more info that would be very helpful.