Training BERT for word embedding

joval · July 13, 2022, 9:39am

Hello everyone,

Please I’m not familiar with BERT, but I’ll like to train a BERT model just for word embedding (not NSP or MLM), in order to compare its impact on some task (I can give details if needed) against W2V.

In my case, I’ll like to train BERT on my dataset, but what I can find in the research is how to train BERT for MLM for example. So I don’t how to use this model to embed word.

Can someone help me please, to archive this goal?

Nlpeva · July 13, 2022, 12:21pm

Please feel free to share more details. It really depends a lot on the context around the sentences/words you want to embed.
Hopefully these two discussions can help:

joval · July 13, 2022, 12:48pm

Hello @Nlpeva and thanks for your response.

In my case, I’d like to use BERT and W2V in Word Sense Disambiguation for ambiguous queries (Information Retrieval).
The goal with these models is to embed all context words of the ambiguous word.

So with W2V, I’ve just built a list of all sentences in my dataset and used gensim package to train a W2V model on these sentences.
In the case of BERT, I’ll like to do the same, just to pass a list of sentences and have at the end of training a model like the original BERT and embed different words

By the way, I’d like to note I’ve been restricted by my supervisor to just use the original BERT model and to train new one on my dataset.

Thank again for the first link, it’ll help me to embed word after training the new BERT on my dataset, but now I’ve to first train the model.

Nlpeva · July 13, 2022, 11:18pm

Hmm, have you looked at spacy-transformers? That might be a good fit for your project… Here’s also a paper I read. They tried fine-tuning BERT on the task of predicting the meaning of the ambiguous word.

That might be a bit too in-depth for what your supervisor wanted, though!

cog · July 14, 2022, 2:24am

hi @joval

There HF docs show BertForMaskedLM parameter and output.

you can train BERT MLM from scratch with that class.

Thanks for nielsr, there some good tutorial of fine tuning BERT with HF.

It will be help to you underestand whole train structure.

regards.

joval · July 14, 2022, 7:43am

Hi @cog .

Thank you for your response.

But as explained in my first message, I’m not looking for BertForMaskedLM, because I don’t aim to use it. I’ld like to use BERT just to embed words, not to predict masked words.

joval · July 14, 2022, 8:13am

Thank you @Nlpeva for all these resources.

The only problem I’ve with all this is that, For my WSD in IR, I already have an existing unsupervised process (configured with W2V), the goal is just to see the impact of other models (especially BERT, as it is supposed to produce best result than W2V) of word embedding.

And to clarify, The process (with W2V) has already been validated with the produced results. So to use BERT I can only adapt it to that process, by embedding context words.

Nlpeva · July 14, 2022, 5:10pm

What do you think about using Sentence-BERT?

https://www.sbert.net/examples/applications/computing-embeddings/README.html#sentence-embeddings-with-transformers

joval · July 14, 2022, 5:36pm

Hi @Nlpeva ,

I think it can be useful for WSD. But, as I described, I don’t want to embed sentences but words. So using it will force me to change my approach, which I can’t because the idea here is to compare the impact of bert and W2V on this approach

pritamdeka · July 19, 2022, 12:05pm

Hi. Sentence BERT is useful for words as well as sentences. You can use it to get word embeddings instead of sentence embeddings.

joval · July 19, 2022, 12:35pm

Hi @pritamdeka ,

Yes, that’s true and thank you.
But what I’d like is how to train BERT on a new dataset, not to use the already pre-train BERT.
And let me notice that, I have data annotated neither for NSP nor for MLM.
It why I’m asking if it’s possible and if yes, how to do it?

pritamdeka · July 19, 2022, 1:37pm

Would it be possible to know how the annotated data looks like?

joval · July 19, 2022, 1:49pm

For which task?

sdegrace · July 21, 2022, 5:57pm

I think there might be a bit of confusion about what BERT is. BERT was trained using MLM and next sentence prediction. You can fine-tune using MLM alone for simplicity’s sake. Once you have finished fine-tuning, all you have to do is grab the embeddings from the model before it’s passed into the MLM head. You can do this by specifying output_hidden_states=True when calling the model.

Topic		Replies	Views
Generate raw word embeddings using transformer models like BERT for downstream process Beginners	9	39971	October 4, 2021
Custom BERT Masked LM output embedding Beginners	0	457	September 7, 2022
How can i get the word representation using BERT? Beginners	2	2318	January 16, 2022
Using custom embeddings for pre-training model for new vocabulary Beginners	0	205	December 25, 2023
Obtaining word-embeddings from Roberta Beginners	13	13276	January 18, 2022

Training BERT for word embedding

Related topics