How to freeze some layers of BertModel

I have a pytorch model with BertModel as the main part and a custom head. I want to freeze the embedding layer and the first few encoding layers, so that I can fine-tune the attention weights of the last few encoding layers and the weights of the custom layers.

I tried:

ct = 0
for child in model.children():

ct += 1
if ct < 11:          # ########## change value - this freezes layers 1-10 
    for param in child.parameters():
        param.requires_grad = False

but I’m not sure that did what I want.

I then ran this to check, but the layer names aren’t recognized

print(L1bb.embeddings.word_embeddings.weight.requires_grad)
print(L1bb.encoder.layer.0.output.dense.weight.requires_grad)
print(L1bb.encoder.layer.3.output.dense.weight.requires_grad)
print(L1bb.encoder.layer.6.output.dense.weight.requires_grad)
print(L1bb.encoder.layer.9.output.dense.weight.requires_grad)
print(L1bb.pooler.dense.weight.requires_grad)
print(L4Lin.requires_grad)

L1bb is the name of the BertModel section in my model, and L1bb.embeddings.word_embeddings.weight is shown in the output of the code that instantiates the model.

How can I freeze the first n layers?
What counts as a layer?
What are the names of the layers in BertModel?
How can I check which layers are frozen?

PS how can I format this pasted code as code in the forum post? One section has done it automatically, but nothing seems to affect it.

You should not rely on the order returned by the parameters method as it does not necessarily match the order of the layers in your model. Instead, you should use it on specific part of your models:

modules = [L1bb.embeddings, *L1bb.encoder.layer[:5]] #Replace 5 by what you want
for module in mdoules:
    for param in module.parameters():
        param.requires_grad = False

will freeze the embeddings layer and the first 5 transformer layers.

7 Likes

Thank you @sgugger, I think that’s working now.

Hi quick question on accessing the embeddings and encoder layers in a bert model. This:

from transformers import BertForMaskedLM
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.encoder

throws the error ModuleAttributeError: 'BertForMaskedLM' object has no attribute 'encoder'

It depends on the model you are using. In general model.base_model should point to the encoder. Otherwise print your model and double check the name :slight_smile:

@sgugger is there any similar way to freeze say 3 layers of TFDistilBertModel
.from_pretrained()
Thanks in advance!

1 Like

When I try that, I get this error:

NameError: name 'mdoules' is not defined

Hi @AnanadP2812,

‘mdoules’ was a typo. Try ‘modules’.

Rachael.

2 Likes

hey @sgugger I tried your code with the BERT model, (model = BertModel.from_pretrained(“bert-base-cased”)

so the code would be:

modules = [model.embeddings, model.encoder.layer[:5]] #Replace 5 by what you want
for module in modules:
    for param in module.parameters():
        param.requires_grad = False

however the changes did not pass into the model, did I miss anything? how do I call the new model?