The name of the body of the model roberta for Roberta models, not bert. So you should loop on for param in model.roberta.parameters(). In general, the attribute that is model agnostic is base_model, so for param in model.base_model.parameters() should work anywhere.
Yes, you can use the names_parameters() method for that.
It gives you the names of the parmeters along with the paramters themselves,
so you can filter only the paramters of the top / bottom layer based on their names and freeze them.
No, you should iterate over them.
Replcae the following line: for param in model.roberta.parameters():
with: for name, param in model.roberta.named_parameters():.
Then filter the parameters you want to freeze using the name variable.
You may print the names to see how they look like, and then come up with a condition that will filter the ones you need.
Hello, thanks very much for that explanation and solution - it has really helped me quite a lot. In terms of filtering parameter, say my dataset has emojis, can I filter emojis? Like, for example, this emoji?
Amm could you please elaborate more? I’m not sure I understood your question.
BTW If it is an unrelated question to the original question of this post, you may ask it in a new one.
Apologies, basically I am trying to do masked language modelling using emojis, but when I deploy my model, the predicted tokens only show words, not emojis; hence, I think that the emojis are not frequent enough in the vocabulary, causing them to be less likely for the masked prediction. Therefore, I was wondering if I could freeze some layers of my BERT model to get the less frequent tokens, which are emojis, to be the top predictions when I deploy my masked language model.
Have you tried increasing the masking probabilty for emojis?
If you have controll over the masking when training your model, you can increase the probability an emoji will be masked and then the model will output higher probabilities for emojis.
I’m not sure this is a good solution but you may try it.
Changing the mlm_probability argument wont give you the result you need,
but I think you can create a sub class of DataCollatorForLanguageModeling that does the emoji masking.
You can find the source code for DataCollatorForLanguageModelinghere.