Transfer learning

Emilio · November 22, 2021, 12:38pm

Hello Everyone, I have a quick question. I trained a Mask Language Model starting from “AutoModelForMaskedLM.from_pretrained” (like a continual pre-training). Then I save that model and I want to use it as starting point for a text classification task, then I use AutoModel.from_pretrained to load the model architecture and finally I use a load_from_ckpt function to copy the weights from the pre-trained model into my new instantiated model. However, I see that the naming of the layers is slightly different from the two models, one has the prefix “bert” and the other does not. This cause a conflict when loading the weights since I check the names of the layers. What is the best way to address this situation?. What I am doing now is to use AutoModelForMaskedLM.from_pretrained also to load the architecture of the fine-tuning, even though i am not training for MLM.
Many thanks!

Emilio · November 30, 2021, 8:18am

@nielsr or @lhoestq any thoughts on this will be more than welcome!

nielsr · November 30, 2021, 8:35am

Hi,

Normally you should only use the from_pretrained() method to load the weights into your new model. You don’t need a load_from_ckpt function.

Emilio · November 30, 2021, 9:24am

Many thanks @nielsr, the thing is that we use PytorchLightning and it saves the model in ckpt format automatically and then it has this load_from_ckpt function too. Do you know if there is compatibility between the ckpt format of PL and the HF format?

Topic		Replies	Views
Using from_pretrained 🤗Transformers	1	50	February 15, 2025
How to use AutoModel Beginners	0	1995	May 4, 2021
Difference BertModel, AutoModel and AutoModelForMaskedLM 🤗Transformers	8	4998	March 9, 2025
How to load my own pretrained model to huggingface code Intermediate	1	860	January 31, 2023
Properly loading a fine tuned model from directory Intermediate	2	2051	August 25, 2020

Transfer learning

Related topics