Search using raw word embedding similarity from BERT

Hello, I have a list of about 100k foods I want users to be able to search through. I’ve explored using word2vec to map search terms and food names to vectors, then return results via vector similarity (cartesian distance). It worked OK, but I’m thinking BERT could have better performance if I’m able to start with a pretrained BERT model, and do transfer learning on a dataset of food recipes/etc. Then I could use the vector outputs from that to do search in a similar way.

First, does this approach even seem practical?

In the BERT models in the documentation it’s not obvious to me how I can A) extract the raw vectors and B) just train the existing model on more data without adding adding new layers to the model.

Is this something doable with BERT, and if so, can you point me to how I can do those things?