Weights in BERT model

minimario · November 26, 2022, 8:50pm

I have a BERT model model = BertForMaskedLM.from_pretrained('bert-base-uncased').

I notice that model.state_dict().keys() has 205 elements, while looking at the [i[0] for i in list(model.named_parameters())] only has 202 elements. The three missing elements are ‘bert.embeddings.position_ids’, ‘cls.predictions.decoder.weight’, and ‘cls.predictions.decoder.bias’. The first is in models.named_buffers(), but how about the other two? What are these doing?

Another question: I noticed that model.named_buffers() also contains bert.embeddings.token_type_ids, but this is not in model.state_dict(). Why isn’t this the case? Shouldn’t it be important to store this in the state as well?

Kiva12138 · April 12, 2023, 3:06am

I have the same question.
But I found that cls.predictions.decoder.weight and bert.embeddings.word_embeddings.weight have the same weights.

Topic		Replies	Views
How does "_tied_weights_keys" work? Beginners	0	514	January 3, 2025
Empty BERT Model, any help? Beginners	2	489	January 5, 2024
How to navigate model parameters to get the weight & bias values? Models	2	2524	August 20, 2024
How to add a new token and assign corresponding weights for all layers for BERT model? Models	0	662	October 10, 2022
Differences between Config.from_pretrained and Model.from_pretrained 🤗Transformers	1	1107	July 20, 2021

Weights in BERT model

Related topics