How to freeze some layers of BertModel

rgwatwormhill · August 29, 2020, 10:23am

I have a pytorch model with BertModel as the main part and a custom head. I want to freeze the embedding layer and the first few encoding layers, so that I can fine-tune the attention weights of the last few encoding layers and the weights of the custom layers.

I tried:

ct = 0
for child in model.children():

ct += 1
if ct < 11:          # ########## change value - this freezes layers 1-10 
    for param in child.parameters():
        param.requires_grad = False

but I’m not sure that did what I want.

I then ran this to check, but the layer names aren’t recognized

print(L1bb.embeddings.word_embeddings.weight.requires_grad)
print(L1bb.encoder.layer.0.output.dense.weight.requires_grad)
print(L1bb.encoder.layer.3.output.dense.weight.requires_grad)
print(L1bb.encoder.layer.6.output.dense.weight.requires_grad)
print(L1bb.encoder.layer.9.output.dense.weight.requires_grad)
print(L1bb.pooler.dense.weight.requires_grad)
print(L4Lin.requires_grad)

L1bb is the name of the BertModel section in my model, and L1bb.embeddings.word_embeddings.weight is shown in the output of the code that instantiates the model.

How can I freeze the first n layers?
What counts as a layer?
What are the names of the layers in BertModel?
How can I check which layers are frozen?

PS how can I format this pasted code as code in the forum post? One section has done it automatically, but nothing seems to affect it.

sgugger · August 31, 2020, 2:02pm

You should not rely on the order returned by the parameters method as it does not necessarily match the order of the layers in your model. Instead, you should use it on specific part of your models:

modules = [L1bb.embeddings, *L1bb.encoder.layer[:5]] #Replace 5 by what you want
for module in mdoules:
    for param in module.parameters():
        param.requires_grad = False

will freeze the embeddings layer and the first 5 transformer layers.

rgwatwormhill · August 31, 2020, 10:33pm

Thank you @sgugger, I think that’s working now.

tillfurger · February 9, 2021, 9:14pm

Hi quick question on accessing the embeddings and encoder layers in a bert model. This:

from transformers import BertForMaskedLM
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
model.encoder

throws the error ModuleAttributeError: 'BertForMaskedLM' object has no attribute 'encoder'

sgugger · February 9, 2021, 11:05pm

It depends on the model you are using. In general model.base_model should point to the encoder. Otherwise print your model and double check the name

nerses0 · February 25, 2021, 5:59pm

@sgugger is there any similar way to freeze say 3 layers of TFDistilBertModel
.from_pretrained()
Thanks in advance!

anon58275033 · July 28, 2021, 8:38am

When I try that, I get this error:

NameError: name 'mdoules' is not defined

rgwatwormhill · July 28, 2021, 1:36pm

Hi @AnanadP2812,

‘mdoules’ was a typo. Try ‘modules’.

Rachael.

BamXio · August 25, 2022, 10:36pm

hey @sgugger I tried your code with the BERT model, (model = BertModel.from_pretrained(“bert-base-cased”)

so the code would be:

modules = [model.embeddings, model.encoder.layer[:5]] #Replace 5 by what you want
for module in modules:
    for param in module.parameters():
        param.requires_grad = False

however the changes did not pass into the model, did I miss anything? how do I call the new model?

Topic		Replies	Views
Gradual Layer Freezing with huggingface model 🤗Transformers	1	884	February 10, 2021
How to freeze layers using trainer? Beginners	11	31980	May 26, 2024
Freezing layers when using gradient checkpointing 🤗Transformers	0	710	March 20, 2022
How to freeze layers while fine-tuning? 🤗Transformers	2	175	May 16, 2025
Gradual Unfreezing support for Fine tuning models 🤗Transformers	3	3933	August 26, 2020

How to freeze some layers of BertModel

Related topics