hf_model = AlbertForQuestionAnswering.from_pretrained("mfeb/albert-xxlarge-v2-squad2")
print(hf_model.state_dict().keys())
After running this command after the model has been installed, I only get the following keys in state_dict(). It only gives keys corresponding to layer0. Other models where I have done the same shows me keys from all the layers. Is this specific model the problem or am I misunderstanding what state_dict().keys() does?
odict_keys(['albert.embeddings.word_embeddings.weight', 'albert.embeddings.position_embeddings.weight', 'albert.embeddings.token_type_embeddings.weight', 'albert.embeddings.LayerNorm.weight', 'albert.embeddings.LayerNorm.bias', 'albert.encoder.embedding_hidden_mapping_in.weight', 'albert.encoder.embedding_hidden_mapping_in.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.full_layer_layer_norm.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.full_layer_layer_norm.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.query.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.query.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.key.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.key.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.value.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.value.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.dense.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.dense.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.LayerNorm.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.attention.LayerNorm.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.ffn.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.ffn.bias', 'albert.encoder.albert_layer_groups.0.albert_layers.0.ffn_output.weight', 'albert.encoder.albert_layer_groups.0.albert_layers.0.ffn_output.bias', 'qa_outputs.weight', 'qa_outputs.bias'])