How does BERT know which contextualised embedding to choose for a word?

theudster · July 20, 2021, 12:32pm

Hi!

I am trying to explain BERT. I understand the concept of contextualised embedding, where one word has different embeddings depending on the context. I also understand that when by using bidirectionality, BERT can learn these contextualised embedding during pretraining.

My question is when finetuning BERT for a task and feeding it a sentence, how does BERT know which contextualised embedding to used for the word, given there are several to choose from?

TheLongSentance · July 20, 2021, 5:07pm

I am new to the topic of Transformers, but my understanding (limited as it is) of BERT is that whilst the pre-trained embedding is created through semi-supervised learning (just large unannoted text corpora), when it comes to fine-tuning BERT for a specific task then this is usually a supervised learning process. So effectively you tell it what the right answers are and as part of the learning process in fine-tuning it will learn which contextualised embedding (and maybe a lot of other related knowledge) is relevant to what you want it to do.

Topic		Replies	Views
Training BERT for word embedding Beginners	17	14469	November 12, 2022
Getting better sentence embeddings with BERT - is it just pretraining, or it is pretraining + fine tuning? Beginners	2	3198	March 2, 2021
Conceptual questions about transformers 🤗Transformers	10	1083	August 26, 2021
Identifying and getting right embeddings from the fine tuned BERT on domain specific data Intermediate	0	1331	September 8, 2021
Combining vectors when using contextual word embeddings with large datasets Beginners	0	30	July 23, 2024

How does BERT know which contextualised embedding to choose for a word?

Related topics