Search using raw word embedding similarity from BERT

gus · October 16, 2021, 9:46pm

Hello, I have a list of about 100k foods I want users to be able to search through. I’ve explored using word2vec to map search terms and food names to vectors, then return results via vector similarity (cartesian distance). It worked OK, but I’m thinking BERT could have better performance if I’m able to start with a pretrained BERT model, and do transfer learning on a dataset of food recipes/etc. Then I could use the vector outputs from that to do search in a similar way.

First, does this approach even seem practical?

In the BERT models in the documentation it’s not obvious to me how I can A) extract the raw vectors and B) just train the existing model on more data without adding adding new layers to the model.

Is this something doable with BERT, and if so, can you point me to how I can do those things?

Thanks!

Topic		Replies	Views
Getting better sentence embeddings with BERT - is it just pretraining, or it is pretraining + fine tuning? Beginners	2	3205	March 2, 2021
Training BERT for word embedding Beginners	17	14573	November 12, 2022
Training for sentence vectors in niche domain Intermediate	18	3300	February 16, 2021
Anyone have advice on best methods to cluster BERT-embedded documents? Beginners	2	2543	August 31, 2021
Generating sentence embeddings from pretrained transformers model Intermediate	1	1099	January 22, 2021

Search using raw word embedding similarity from BERT

Related topics