Unfreeze BERT vs pre-train BERT for Sentiment Analysis

MahdiA · December 22, 2021, 2:28pm

I am doing Sentiment Analysis over some text reviews, but I do not get good results from.
I use BERT for feature extraction and a Fully Connected as classifier.

I am going to do these experiments, but I do not have any overview of the results in general. I have two options:
1- Unfreeze some Transfomer layers and let the gradient propagate over that layers
2- Do pre-train the BERT with masked language over related texts and then use classifier.

Which one has the priority? Or it depends just on experiments?

adorkin · December 23, 2021, 5:03pm

Currently, it seems that the consensus is that to get the best results when fine-tuning on a downstream task you don’t freeze any layers at all. If you’re freezing the weights to save up on memory, then I’d suggest considering Adapter Framework. The idea of it is, basically, to insert additional trainable layers in-between existing frozen layers of a Transformer model. It should help, but there’s no guarantee that the results will be on par with full fine-tuning.
Here I assume that you mean fine-tuning an existing pre-trained BERT with MLM objective. This may help, but it depends on the kind of texts you’re trying to classify. If you have a reason to believe that these texts are noticeably different from the texts that BERT was trained, then it’s likely to improve the results, although it may hinder its generalization ability.

It’s a safe bet to say that just unfreezing the weights will be the most advantageous, so I’d start with that, if it’s an option.

MahdiA · December 24, 2021, 12:59pm

I write this for someone who is going to do experience in future.
I have tested unfreezing on my dataset but it seems going to be overfitted.
Epoch 4:
trainning loss = .0003 , acc ~ 95%
validation loss = 4.3 , acc ~ 40%

So, I am going to try next option, train a BERT model.

Topic		Replies	Views
What is transfer learning and why is it needed? Beginners	1	2099	March 16, 2021
Fine Tune BERT Models Beginners	5	16620	June 25, 2021
The point of using pretrained model if I don't freeze layers Beginners	1	8550	May 31, 2023
Training on Domain specific Dataset Beginners	3	709	March 22, 2021
Text classification on small dataset (8K) Intermediate	1	896	July 27, 2021

Unfreeze BERT vs pre-train BERT for Sentiment Analysis

Related topics