Transfer learning

Hello Everyone, I have a quick question. I trained a Mask Language Model starting from “AutoModelForMaskedLM.from_pretrained” (like a continual pre-training). Then I save that model and I want to use it as starting point for a text classification task, then I use AutoModel.from_pretrained to load the model architecture and finally I use a load_from_ckpt function to copy the weights from the pre-trained model into my new instantiated model. However, I see that the naming of the layers is slightly different from the two models, one has the prefix “bert” and the other does not. This cause a conflict when loading the weights since I check the names of the layers. What is the best way to address this situation?. What I am doing now is to use AutoModelForMaskedLM.from_pretrained also to load the architecture of the fine-tuning, even though i am not training for MLM.
Many thanks!

1 Like

@nielsr or @lhoestq any thoughts on this will be more than welcome!


Normally you should only use the from_pretrained() method to load the weights into your new model. You don’t need a load_from_ckpt function.

Many thanks @nielsr, the thing is that we use PytorchLightning and it saves the model in ckpt format automatically and then it has this load_from_ckpt function too. Do you know if there is compatibility between the ckpt format of PL and the HF format?

1 Like