I have a BERT model
model = BertForMaskedLM.from_pretrained('bert-base-uncased').
I notice that
model.state_dict().keys() has 205 elements, while looking at the
[i for i in list(model.named_parameters())] only has 202 elements. The three missing elements are ‘bert.embeddings.position_ids’, ‘cls.predictions.decoder.weight’, and ‘cls.predictions.decoder.bias’. The first is in
models.named_buffers(), but how about the other two? What are these doing?
Another question: I noticed that
model.named_buffers() also contains
bert.embeddings.token_type_ids, but this is not in
model.state_dict(). Why isn’t this the case? Shouldn’t it be important to store this in the state as well?