I have a BERT model model = BertForMaskedLM.from_pretrained('bert-base-uncased')
.
I notice that model.state_dict().keys()
has 205 elements, while looking at the [i[0] for i in list(model.named_parameters())]
only has 202 elements. The three missing elements are ‘bert.embeddings.position_ids’, ‘cls.predictions.decoder.weight’, and ‘cls.predictions.decoder.bias’. The first is in models.named_buffers()
, but how about the other two? What are these doing?
Another question: I noticed that model.named_buffers()
also contains bert.embeddings.token_type_ids
, but this is not in model.state_dict()
. Why isn’t this the case? Shouldn’t it be important to store this in the state as well?