Differences between Config.from_pretrained and Model.from_pretrained

alejandrocr · July 20, 2021, 3:08pm

Hey there! I have a question regarding the differences between loading a multilingual BERT model from pretrained weights and from a pretrained Config:

Shouldn’t the two models defined below have the same weights?

from transformers import BertConfig, BertModel

mbert_model_1 = BertModel.from_pretrained("bert-base-multilingual-uncased")

mbert_config = BertConfig.from_pretrained("bert-base-multilingual-uncased")
mbert_model_2 = BertModel(mbert_config)

I have checked and they have the same architecture, but the layer weights (and the results obtained when using them) are different.

Sorry if it’s a well-known question but I had never loaded models from Configs and I’ve found this discrepancy. (I’ve looked for a previous question related to this topic but I haven’t found any).

Thanks for your help!

sgugger · July 20, 2021, 8:25pm

You should have a look at the relevant section in the course and the correspond video where all of this is explained.

The first model is initialized with the pretrained weights, the second is the same architecture but is initialized randomly.

Topic		Replies	Views
Should I use BertConfig? Why these output are different? Beginners	1	520	February 11, 2022
Load Bert model weights to transformers v3 from model trained with transformers v2 🤗Transformers	2	298	November 2, 2020
Tips for PreTraining BERT from scratch 🤗Transformers	19	9846	December 10, 2020
Is there a way to correctly load a pre-trained transformers model without the configuration file? Beginners	6	17820	August 13, 2021
What does from_pretrained do? Beginners	2	538	September 10, 2024

Differences between Config.from_pretrained and Model.from_pretrained

Related topics