Freeze encoder for some time and then unfreeze - does it improve the model?


I fear that when training with a random inited prediction head the gradients might destroy my pretrained language model. Even when startig with a warmup LR schedule. Did anybody do experiments with freezing the language model for lets say one epoch (and only train the head) and then unfreeze everything?

Does that improve the final model?