How to freeze layers while fine-tuning?

luohuashijieyoufengj · May 15, 2025, 9:17am

I download pretrained model from huggingface, and add some layers to it.
Now I only want to change layers I added, so I need to freeze layers of pretrained model while training. How can I do it?

John6666 · May 15, 2025, 2:19pm

Maybe requires_grad=False?

github.com/huggingface/transformers

Exclude the parameters with `requires_grad=False` in the `Trainer` optimizer.

opened 07:46PM - 18 Jan 23 UTC

closed 02:28PM - 23 Jan 23 UTC

avsolatorio

### Feature request Attempt to optimize the training for models with weights/pa…rameters that are set to `requires_grad=False`. This is done by excluding these parameters in the optimizer. ### Motivation I am building a Seq2Seq model where I use a pre-trained model for the encoder. I freeze all the parameters of the encoder by setting `requires_grad=False`. I expected the training to speed up compared to a model where both the encoder and decoder weights are trainable. However, I found that there's no difference in speed and also memory. I investigated a bit and found that all the model parameters, regardless of whether gradients are required to be computed, are included in the optimizer https://github.com/huggingface/transformers/blob/00ba7cadd812437708b380ab078a3cfe8cfaff31/src/transformers/trainer.py#L1021-L1030 I tested an idea and subclassed the `Seq2SeqTrainer`. So, I updated the above snippet with this: ```Python optimizer_grouped_parameters = [ { # Add here the `p.requires_grad` condition "params": [p for n, p in opt_model.named_parameters() if (n in decay_parameters and p.requires_grad)], "weight_decay": self.args.weight_decay, }, { # Add here the `p.requires_grad` condition "params": [p for n, p in opt_model.named_parameters() if (n not in decay_parameters and p.requires_grad)], "weight_decay": 0.0, }, ] ``` Doing this actually improved both the speed and the memory during the training. I was wondering if this is something we can add to the codebase. If not, I am curious as to why we shouldn't exclude the parameters that are intended not to be trainable in the optimizer. ### Your contribution I can make the PR if this is an acceptable change. 🤗

luohuashijieyoufengj · May 16, 2025, 3:11am

It seems work, thanks.

Topic		Replies	Views
How to freeze layers using trainer? Beginners	11	32019	May 26, 2024
The point of using pretrained model if I don't freeze layers Beginners	1	8538	May 31, 2023
Gradual Unfreezing support for Fine tuning models 🤗Transformers	3	3936	August 26, 2020
Gradual Layer Freezing with huggingface model 🤗Transformers	1	885	February 10, 2021
How to freeze some layers of BertModel Beginners	8	17545	August 25, 2022

How to freeze layers while fine-tuning?

Related topics