Freeze encoder for some time and then unfreeze - does it improve the model?

Hi,

I fear that when training with a random inited prediction head the gradients might destroy my pretrained language model. Even when startig with a warmup LR schedule. Did anybody do experiments with freezing the language model for lets say one epoch (and only train the head) and then unfreeze everything?

Does that improve the final model?

Thanks
Philip

this is such an old question but for anyone who’s curious and stumbles upon this thread: yes, i trained a few language models and found freeze then unfreeze works better than just freeze then stop. i’ve not formally validated the final performance of starting unfrozen from the beginning, but hopefully that gap is narrow enough to make this practice count. it’s also similar to gradual freezing some models employ (e.g., gpt2)