The point of using pretrained model if I don't freeze layers

mrblithe · May 22, 2023, 10:35pm

Hi, a newbie here. I am using Trainer API to fine-tune Bert model for classification tasks.

As I understand correctly, Trainer doesn’t freeze any layer of the pre-trained model. As I follow tutorials, I didn’t see any mention of freezing either. (Fine-tune a pretrained model)

So I followed the tutorial and completed fine-tuning. After, I dig some GitHub projects to find examples for improving the accuracy, and saw freezing layer codes just like below. So without knowing it, I trained the whole network and wasted all previous learnings of Bert model (right?).

for param in model.roberta.parameters():
    param.requires_grad = False

print(f"num params:", model.num_parameters())
print(f"num trainable params:", model.num_parameters(only_trainable=True))

So, I want to ask what is the point of using pre-trained model (or transfer learning, fine tuning as a concept) if I train Bert from scratch or should I freeze some layers as I saw on other’s code?

dblakely · May 31, 2023, 1:46am

This part is mostly incorrect - fine-tuning a model often causes some degradation and “forgetting” of the knowledge the model learned during pretraining. But in most cases, it won’t lose very much and definitely not all of the previous knowledge.

So, I want to ask what is the point of using pre-trained model (or transfer learning, fine tuning as a concept) if I train Bert from scratch or should I freeze some layers as I saw on other’s code?

The main reason not to freeze layers is so that you can harness all of the model’s parameters to help it learn the task. Ie all of the model’s layers will be updated to better represent your training data and reduce loss. In my experience, this usually results in better performance when fine-tuning than freezing layers.

I can’t say why the particular repos you were looking at froze layers, but there are reasons to do this in some cases. One is computational efficiency - if you’re not updating all the parameters of the model, training will be faster and use less memory (less GPU memory needed for holding gradients for all the model parameters). Another is perhaps the dataset is very small and they were worried about overfitting, so they reduced the number of parameters dedicated to learning the task. Maybe they also just did experiments and observed that it helped with their task.

Ultimately, I would tend to recommend against freezing layers unless you really need to save memory.

Topic		Replies	Views
How to freeze layers while fine-tuning? 🤗Transformers	2	199	May 16, 2025
Pretraining or Finetuning Beginners	1	139	October 6, 2024
How to freeze layers using trainer? Beginners	11	32043	May 26, 2024
What is transfer learning and why is it needed? Beginners	1	2099	March 16, 2021
Unfreeze BERT vs pre-train BERT for Sentiment Analysis Beginners	2	1356	December 24, 2021

The point of using pretrained model if I don't freeze layers

Related topics