BERT: AttributeError: 'RobertaForMaskedLM' object has no attribute 'bert'

I am trying to freeze some layers of my masked language model using the following code:

for param in model.bert.parameters():
    param.requires_grad = False

However, when I execute the code above, I get this error:

AttributeError: 'RobertaForMaskedLM' object has no attribute 'bert'

In my code, I have the following imports for my masked language model, but I am unsure what is causing the error above:

from transformers import AutoModelForMaskedLM
model = AutoModelForMaskedLM.from_pretrained(model_checkpoint)

So far, I have tried to replace bert with model in my code, but that did not work.

Any help would be good.


The name of the body of the model roberta for Roberta models, not bert. So you should loop on for param in model.roberta.parameters(). In general, the attribute that is model agnostic is base_model, so for param in model.base_model.parameters() should work anywhere.

Okay, thanks for that. Now, is it possible to freeze just the top layer or bottom layer of the BERT model?

Yes, you can use the names_parameters() method for that.
It gives you the names of the parmeters along with the paramters themselves,
so you can filter only the paramters of the top / bottom layer based on their names and freeze them.

1 Like

Hello, thanks for the reply. So, do I just simply add this code?


No, you should iterate over them.
Replcae the following line: for param in model.roberta.parameters():
with: for name, param in model.roberta.named_parameters():.
Then filter the parameters you want to freeze using the name variable.

You may print the names to see how they look like, and then come up with a condition that will filter the ones you need.

Hope it helps :slight_smile: .

1 Like

Hello, thanks very much for that explanation and solution - it has really helped me quite a lot. In terms of filtering parameter, say my dataset has emojis, can I filter emojis? Like, for example, this :grinning: emoji?

Amm could you please elaborate more? I’m not sure I understood your question.
BTW If it is an unrelated question to the original question of this post, you may ask it in a new one.

Apologies, basically I am trying to do masked language modelling using emojis, but when I deploy my model, the predicted tokens only show words, not emojis; hence, I think that the emojis are not frequent enough in the vocabulary, causing them to be less likely for the masked prediction. Therefore, I was wondering if I could freeze some layers of my BERT model to get the less frequent tokens, which are emojis, to be the top predictions when I deploy my masked language model.

@Yuti - How would I go about filtering characters ?

Have you tried increasing the masking probabilty for emojis?
If you have controll over the masking when training your model, you can increase the probability an emoji will be masked and then the model will output higher probabilities for emojis.

I’m not sure this is a good solution but you may try it.

@Yuti In my code, I have this line which allows me to adjust the masked language probability:

from transformers import DataCollatorForLanguageModeling
data_collator = DataCollatorForLanguageModeling(tokenizer=tokenizer, mlm_probability=0.15)

However, if I adjusted it, I am unsure how to do it for the emojis.

Changing the mlm_probability argument wont give you the result you need,
but I think you can create a sub class of DataCollatorForLanguageModeling that does the emoji masking.

You can find the source code for DataCollatorForLanguageModeling here.