BertForMaskedLM on a fine-tuned base model

ndharap · September 7, 2020, 8:20pm

Hello,
Is there a way for me to fine-tune the base bert/roberta architecture on a task like sequence classification, and then use the fine-tuned model as a base model for MLM predictions? I tried this by copying the state dict over from the sequence classification task into the MLM architecture, but that did not work at all. Seems like the weights that I swap from the sequence prediction task do not play well with the MLM objective.

Here is a code snippet -

#Load the fine-tuned ‘roberta-base’ model into RobertaForMaskedLM
roberta_mlm_model = RobertaForMaskedLM.from_pretrained(MODEL_FILE)

#load the default model
default_model = RobertaForMaskedLM.from_pretrained(‘roberta-base’)

#swap the weights for the head
roberta_mlm_model.lm_head.load_state_dict(default_model.lm_head.state_dict())

Can someone tell me if I am thinking in the right direction here?

Nikhil

BramVanroy · September 7, 2020, 9:28pm

This will be problematic because the heads are not compatible. In other words, you can fine tune the model and use the weights from one model in the other, but you still have the issue that the heads are different and cannot be mapped. So after fine tuning for sequenence classification, saving model, and loading that model in a MLM version of that architecture the LMHead will not have pretrained weights.

ndharap · September 7, 2020, 9:58pm

Thanks @BramVanroy! That is what I thought. The weights will not be relevant. How do you suggest I solve this? I was thinking of trying to train both the sequence classification task and the mlm task on my dataset in a multi-task setting. Not sure it you or anyone has any pointers

Topic		Replies	Views
Finetuning a specific task when pretrained model isn't trained on that specific task? Using the task model vs using the base model Beginners	4	1019	September 14, 2020
RoBERTa MLM fine-tuning Beginners	1	1873	November 24, 2021
Fine-tuned pre-trained Roberta model on different labels 🤗Transformers	0	635	April 7, 2022
Has anyone come across BERT fine-tuned for CLM task? 🤗Transformers	0	84	March 8, 2024
Load fine-tuned LM without the head? Beginners	2	1543	February 22, 2022

BertForMaskedLM on a fine-tuned base model

Related topics