Getting better sentence embeddings with BERT - is it just pretraining, or it is pretraining + fine tuning?

vintagedeek · February 17, 2021, 3:34pm

I am hoping to confirm my understanding of some definitions in the context of BERT.

(1) Pre-training means running a corpus through the BERT architecture where masked language modeling and next sentence prediction are used to derive weights. You can do this (a) from scratch with your own vocabulary and randomly initialized weights or (b) using the pre-trained BERT vocab/weights (so you are in effect “pre-training a pre-trained model.”

(2) fine tuning means adding a layer to the BERT architecture for some downstream task, such as classification.

Questions
(A) Is there anything incorrect in my understanding above?

(B) Suppose my goal is only to get better embeddings (e.g., for computing cosine similarity between sentences). Would I just want to pre-train the model on my corpus? Is fine tuning also used to get better embeddings - for example, if I fine tune the pretrained BERT model for some classification task, could I use the neurons in the 2nd to last hidden layer to derive sentence embeddings that could later be used to compare cosine similarity between sentences? I currently use the 2nd to last hidden layer of downloaded pretrained BERT models for my sentence embeddings.

I’m trying to understand - if you wanted to do semantic similarity in the future, would you rather derive embeddings from your pre-trained BERT or your pre-trained AND fine tuned BERT?

VP1 · March 2, 2021, 9:14am

@vintagedeek,

(1) Pre-training means running a corpus through the BERT architecture where masked language modeling and next sentence prediction are used to derive weights. You can do this (a) from scratch with your own vocabulary and randomly initialized weights or (b) using the pre-trained BERT vocab/weights (so you are in effect “pre-training a pre-trained model.”
(2) fine tuning means adding a layer to the BERT architecture for some downstream task, such as classification.

seems fine to me

for better embeddings and similarity, you may want to check this:

and this:

vintagedeek · March 2, 2021, 12:42pm

Thank you! These look very helpful.

Topic		Replies	Views
Further Pretrain Basic BERT for sequence classification 🤗Transformers	4	1805	October 9, 2020
Custom Tasks and BERT Fine Tuning Beginners	4	5000	October 30, 2020
Fine Tune BERT Models Beginners	5	16579	June 25, 2021
Way to fine tune pre trained model & get the embeddings Intermediate	2	3567	May 28, 2024
Identifying and getting right embeddings from the fine tuned BERT on domain specific data Intermediate	0	1331	September 8, 2021

Getting better sentence embeddings with BERT - is it just pretraining, or it is pretraining + fine tuning?

Related topics