How to freeze layers using trainer?

AJHoeh · March 19, 2021, 11:44am

Hey,

I am trying to figure out how to freeze layers of a model and read that I had to use

for param in model.base_model.parameters():
    param.requires_grad = False

if I wanted to freeze the encoder of a pretrained MLM for example. But how do I use this with the Trainer?
I tried the following:

from transformers import BertTokenizer, BertForMaskedLM. LineByLineTextDataset, DataCollatorForLanguageModeling, Trainer, TrainingArguments
model = BertForMaskedLM.from_pretrained('bert-base-uncased')
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')

for param in model.base_model.parameters():
    param.requires_grad = False

dataset = LineByLineTextDataset(
    tokenizer=tokenizer,
    file_path=in_path,
    block_size=512,
)

data_collator = DataCollatorForLanguageModeling(
    tokenizer=tokenizer, mlm=True, mlm_probability=0.15
)

training_args = TrainingArguments(
    output_dir=out_path,
    overwrite_output_dir=True,
    num_train_epochs=25,
    per_device_train_batch_size=48,
    save_steps=500,
    save_total_limit=2,
    seed=1
)

trainer = Trainer(
    model=model,
    args=training_args,
    data_collator=data_collator,
    train_dataset=dataset
)

trainer.train()

If the encoder was frozen I would expect it to produce the same outputs as a fresh instance of the pretrained encoder, but it doesn’t:

model_fresh = BertForMaskedLM.from_pretrained('bert-base-uncased')
inputs = tokenizer("This is a boring test sentence", return_tensors="pt")
torch.all(model.bert(**inputs)[0].eq(model_fresh.bert(**inputs)[0]))
--> tensor(false)

So I must be doing somethin wrong here, I guess the Trainer is reseting the requires_grad attribute and I have to overwrite it somehow after I instanciated the trainer?

Thanks in advance!
Johannes

nielsr · March 19, 2021, 12:56pm

Looking at the source code of BertForMaskedLM, the base model is the “bert” attribute, not the “base_model” attribute. So if you want to freeze the parameters of the base model before training, you should type

for param in model.bert.parameters():
    param.requires_grad = False

instead.

sgugger · March 19, 2021, 12:58pm

@nielsr base_model is an attribute that will work on all the PreTraineModel (to make it easy to access the encoder in a generic fashion)

The Trainer puts your model into training mode, so your difference might simply come from that (there are dropouts in the model). You should check if putting it back in eval mode solves your problem.

nielsr · March 19, 2021, 12:59pm

@sgugger oh didn’t know that, I learn every day!

AJHoeh · March 19, 2021, 1:01pm

OMG! This is so obvious and I cant believe I didn’t realize that. Will test and report! Thanks

AJHoeh · March 19, 2021, 4:13pm

@sgugger model.eval() should have done the trick, right? I am afraid the results still don’t match

sgugger · March 19, 2021, 4:37pm

You should inspect the weights to see where they difer then. Trainer will not change the requires_grad value of your parameters.

AJHoeh · March 22, 2021, 10:53am

@sgugger Thanks, that was important to know for me so I knew I had to be the one screwing up somewhere else and I did somehow manage that

anon58275033 · July 14, 2021, 7:55pm

Hi,

I tried your code, but I am getting this error:

AttributeError: 'RobertaForMaskedLM' object has no attribute 'bert'

AJHoeh · July 14, 2021, 8:26pm

Hey,

yeah, this is because you are using roberta instead of bert, therefore it uses .roberta to store the encoder. I believe there is some model independant keyword like “base_model” or something, but I dont know right now (im on vacation, but maybe you can try or google it). Hope that helps!

Best
Johannes

AJHoeh · July 14, 2021, 8:30pm

Sorry, answered per mail, sgugger literary provided the base_model keyword in this thread, so there you go

Hanqix · May 26, 2024, 5:55pm

Maybe its a problem with using eq instead of isclose?

Topic		Replies	Views
How to freeze layers while fine-tuning? 🤗Transformers	2	175	May 16, 2025
How to freeze some layers of BertModel Beginners	8	17537	August 25, 2022
The point of using pretrained model if I don't freeze layers Beginners	1	8517	May 31, 2023
Gradual Layer Freezing with huggingface model 🤗Transformers	1	884	February 10, 2021
Freeze Lower Layers with Auto Classification Model 🤗Transformers	6	18162	May 25, 2023

How to freeze layers using trainer?

Related topics