Differences between Config.from_pretrained and Model.from_pretrained

Hey there! I have a question regarding the differences between loading a multilingual BERT model from pretrained weights and from a pretrained Config:

Shouldn’t the two models defined below have the same weights?

from transformers import BertConfig, BertModel

mbert_model_1 = BertModel.from_pretrained("bert-base-multilingual-uncased")

mbert_config = BertConfig.from_pretrained("bert-base-multilingual-uncased")
mbert_model_2 = BertModel(mbert_config)

I have checked and they have the same architecture, but the layer weights (and the results obtained when using them) are different.

Sorry if it’s a well-known question but I had never loaded models from Configs and I’ve found this discrepancy. (I’ve looked for a previous question related to this topic but I haven’t found any).

Thanks for your help! :hugs:

You should have a look at the relevant section in the course and the correspond video where all of this is explained.

The first model is initialized with the pretrained weights, the second is the same architecture but is initialized randomly.

2 Likes