Retrain/reuse fine-tuned models on different set of labels

kevinyauris · July 17, 2020, 7:27am

Hello,
I am wondering is it possible to reuse or retrain a fine-tuned model with a new set of labels(the set of labels contain new labels or the new set of labels is a subset of the labels used to fine-tune the model)?
What I try to do is fine-tune pre-trained models for a task (e.g. NER) in the domain free dataset, then reuse/retrain this fine-tuned model to do a similar task but in a more specific domain (e.g. NER for healthcare), thus in this specific domain, the set of labels may not be the same.
I already try to fine-tune a BERT model to do NER on WNUT17 data based on token classification example in Transformers GitHub. After that, I try to retrain the fine-tuned model by adding a new label and provide train data that has this label, the train failed with error

RuntimeError: Error(s) in loading state_dict for BertForTokenClassification:
size mismatch for classifier.weight: copying a param with shape torch.Size([13, 1024]) from checkpoint, the shape in current model is torch.Size([15, 1024]).
size mismatch for classifier.bias: copying a param with shape torch.Size([13]) from checkpoint, the shape in current model is torch.Size([15]).

Is it possible to do this with Transformers and if so how? Thank you in advance!

kevinyauris · July 19, 2020, 7:04am

AFAIK now it is not possible to use the fine-tuned model to be retrained on a new set of labels. A workaround for this is to fine-tune a pre-trained model use whole (old + new) data with a superset of the old + new labels. Is true?

I know it’s more of an ML question than a specific question toward this package, but I will really appreciate it if anyone can refer me to some reference that explains this. Thank you in advance.

sgugger · July 20, 2020, 1:55pm

In general, when loading a pretrained model, the library expects tensor shapes to match those of the pretrained model you want to load. This is why you are getting your error, and I don’t think there is a quick fix to this.
I think you need to manually write some loading method in this case, to load everything but the last layer.

kevinyauris · July 21, 2020, 6:17am

Thank you for your reply.
If you don’t mind can you explain more about how to manually write some loading method and use it to continue the train with different set of labels?
Also, what is the pros and cons of doing it instead of fine-tuning a pre-trained model use whole (old + new) data with a superset of the old + new labels?

kevinyauris · July 26, 2020, 8:38am

@sgugger
So I try to do some modification into https://github.com/huggingface/transformers/blob/master/examples/token-classification/run_tf_ner.py
I try to do what you said but I don’t know if I am doing it right or not. So what I am trying to do is to load the model as it is using from_pretrained and then change the last layer if the labels are not the same as the defined labels.
Can you please review it and told me if I am doing it right? Thank you in advance!

    config = AutoConfig.from_pretrained(
        model_args.config_name if model_args.config_name else model_args.model_name_or_path,
        num_labels=num_labels,
        id2label=label_map,
        label2id={label: i for i, label in enumerate(labels)},
        cache_dir=model_args.cache_dir,
    )
    tokenizer = AutoTokenizer.from_pretrained(
        model_args.tokenizer_name if model_args.tokenizer_name else model_args.model_name_or_path,
        cache_dir=model_args.cache_dir,
        use_fast=model_args.use_fast,
    )

    with training_args.strategy.scope():
        model = TFAutoModelForTokenClassification.from_pretrained(
            model_args.model_name_or_path,
            from_pt=bool(".bin" in model_args.model_name_or_path),
            cache_dir=model_args.cache_dir,
        )

        if model.config.num_labels != config.num_labels or model.config.id2label != config.id2label:
            model.classifier = tf.keras.layers.Dense(
                config.num_labels, kernel_initializer=get_initializer(config.initializer_range), name="classifier"
            )
        model.config = config

sgugger · July 26, 2020, 5:14pm

I’m on an expert on the TF side, but this looks good to me.

kevinyauris · July 27, 2020, 1:29am

Thank you very much @sgugger

g3casey · April 8, 2021, 1:04am

@kevinyauris, did you get your model working with new labels?
The github repository link you referenced no longer works so if you have some more info that would be very helpful.
Thanks!!

Topic		Replies	Views
How do I change the classification head of a model? 🤗Transformers	31	52981	November 14, 2024
Named Entity Recognition: fine-tune or create new model? Beginners	3	3557	February 11, 2023
How to use fine-tuned model Beginners	1	307	April 27, 2021
Replacing last layer of a fine-tuned model to use different set of labels Beginners	6	6589	December 23, 2021
How to fine-tune BERT model for NER if forward method doesn't have "labels" argument 🤗Transformers	2	939	October 20, 2021

Retrain/reuse fine-tuned models on different set of labels

Related topics