Weights in BERT model

I have a BERT model model = BertForMaskedLM.from_pretrained('bert-base-uncased').

I notice that model.state_dict().keys() has 205 elements, while looking at the [i[0] for i in list(model.named_parameters())] only has 202 elements. The three missing elements are ‘bert.embeddings.position_ids’, ‘cls.predictions.decoder.weight’, and ‘cls.predictions.decoder.bias’. The first is in models.named_buffers(), but how about the other two? What are these doing?

Another question: I noticed that model.named_buffers() also contains bert.embeddings.token_type_ids, but this is not in model.state_dict(). Why isn’t this the case? Shouldn’t it be important to store this in the state as well?

2 Likes

I have the same question.
But I found that cls.predictions.decoder.weight and bert.embeddings.word_embeddings.weight have the same weights.