Is BERT document embedding model?

MahdiA · October 29, 2021, 7:21pm

Are BERT and its derivatives(like DistilBert, RoBertA,…) document embedding methods like Doc2Vec?

SMMousavi · October 30, 2021, 12:21am

Do you mean they will map the words to vectors? Yes, they do, but it’s different than some methods like word2veq; I am not sure about Doc2Vec, though. For example, in word2veq, we give each word only one vector, and that’s it. This is not ideal since some words have different meanings in different contexts; for example, we have banks where we go to deposit or withdraw money, and we have river banks. Word2vec will give both banks the same vector, but in BERT, the vector is based on the context.

MahdiA · October 30, 2021, 1:07am

True, Doc2vec is like w2v but just it includes document_id. So we can use that as both w2v and d2c, right?

adorkin · November 2, 2021, 2:37pm

Such models output representations for each token in context of other tokens to the left and to the right of it. You need to aggregate these representations somehow to obtain a single vector representing a document. A common approach is to average vectors of each token, for example. I’d suggest using sentence transformers for this purpose.

MahdiA · November 4, 2021, 11:21am

You mean, that 768 features that we have in BERT output, cannot represent a document itself?

adorkin · November 4, 2021, 12:11pm

BERT output is not just 768 features, but 768 features for each token.

Topic		Replies	Views
Training BERT for word embedding Beginners	17	14467	November 12, 2022
Generate raw word embeddings using transformer models like BERT for downstream process Beginners	9	39920	October 4, 2021
Search using raw word embedding similarity from BERT Beginners	0	828	October 16, 2021
What is the best way to create a unique representation of a word from BERT embeddings? Beginners	1	436	June 14, 2022
How can i get the word representation using BERT? Beginners	2	2303	January 16, 2022

Is BERT document embedding model?

Related topics