Hello,
Is there a way for me to fine-tune the base bert/roberta architecture on a task like sequence classification, and then use the fine-tuned model as a base model for MLM predictions? I tried this by copying the state dict over from the sequence classification task into the MLM architecture, but that did not work at all. Seems like the weights that I swap from the sequence prediction task do not play well with the MLM objective.
Here is a code snippet -
#Load the fine-tuned ‘roberta-base’ model into RobertaForMaskedLM
roberta_mlm_model = RobertaForMaskedLM.from_pretrained(MODEL_FILE)
#load the default model
default_model = RobertaForMaskedLM.from_pretrained(‘roberta-base’)
#swap the weights for the head
roberta_mlm_model.lm_head.load_state_dict(default_model.lm_head.state_dict())
Can someone tell me if I am thinking in the right direction here?
Nikhil