Freeze encoder for some time and then unfreeze - does it improve the model?

PhilipMay · September 21, 2020, 5:34pm

Hi,

I fear that when training with a random inited prediction head the gradients might destroy my pretrained language model. Even when startig with a warmup LR schedule. Did anybody do experiments with freezing the language model for lets say one epoch (and only train the head) and then unfreeze everything?

Does that improve the final model?

Thanks
Philip

ozanciga · March 24, 2023, 1:42am

this is such an old question but for anyone who’s curious and stumbles upon this thread: yes, i trained a few language models and found freeze then unfreeze works better than just freeze then stop. i’ve not formally validated the final performance of starting unfrozen from the beginning, but hopefully that gap is narrow enough to make this practice count. it’s also similar to gradual freezing some models employ (e.g., gpt2)

Topic		Replies	Views
Gradual Unfreezing support for Fine tuning models 🤗Transformers	3	4005	August 26, 2020
Freezing first N layers of a transformer model 🤗Transformers	0	946	August 5, 2022
What is transfer learning and why is it needed? Beginners	1	2116	March 16, 2021
Gradual Layer Freezing with huggingface model 🤗Transformers	1	898	February 10, 2021
Unfreeze BERT vs pre-train BERT for Sentiment Analysis Beginners	2	1401	December 24, 2021

Freeze encoder for some time and then unfreeze - does it improve the model?

Related topics