Unfreeze BERT vs pre-train BERT for Sentiment Analysis

adorkin · December 23, 2021, 5:03pm

Currently, it seems that the consensus is that to get the best results when fine-tuning on a downstream task you don’t freeze any layers at all. If you’re freezing the weights to save up on memory, then I’d suggest considering Adapter Framework. The idea of it is, basically, to insert additional trainable layers in-between existing frozen layers of a Transformer model. It should help, but there’s no guarantee that the results will be on par with full fine-tuning.
Here I assume that you mean fine-tuning an existing pre-trained BERT with MLM objective. This may help, but it depends on the kind of texts you’re trying to classify. If you have a reason to believe that these texts are noticeably different from the texts that BERT was trained, then it’s likely to improve the results, although it may hinder its generalization ability.

It’s a safe bet to say that just unfreezing the weights will be the most advantageous, so I’d start with that, if it’s an option.

Topic		Replies	Views
What is transfer learning and why is it needed? Beginners	1	2096	March 16, 2021
Fine Tune BERT Models Beginners	5	16579	June 25, 2021
The point of using pretrained model if I don't freeze layers Beginners	1	8514	May 31, 2023
Training on Domain specific Dataset Beginners	3	709	March 22, 2021
Text classification on small dataset (8K) Intermediate	1	896	July 27, 2021