How to find closest embedding vectors?

neo-benjamin · July 22, 2022, 9:15pm

I have 100K known embedding i.e.

[emb_1, emb_2, ..., emb_100000]

My task is given an embedding(embedding_new) find the closest 10 embedding from the above 100k embedding.

The way I am approaching this problem is brute force.

Given a query to find the closest embeddings, I compare embedding_new with [emb_1, emb_2, ..., emb_100000] and get the similarity score.

Then I do quicksort of the similarity score to get the top 10 closest embedding.

Is there a better way to achieve this?

osanseviero · July 24, 2022, 8:56am

Hey there! I don’t think you can get much faster than this, but I may be wrong. Once you have the list of similarities, you can use torch.topk to obtain the top results of interest.

Sharan2001 · July 26, 2022, 1:29am

Yes, please have a look at Nearest Neighbour Search algorithms like Faiss, Hnsw, etc. You can also you libraries like annoy.

Annoy Library: GitHub - spotify/annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

The above mentioned algorithms use special data structures like graphs/trees to reduce the run time from linear.

Thanks,
Sharan Babu

Topic		Replies	Views
Am I using Embeddings wrong or is it the wrong approach Beginners	2	122	April 26, 2024
FAISS similarity search error Intermediate	0	607	April 20, 2024
Get sentence embedding vector using API? 🤗Transformers	0	335	September 10, 2021
Inference endpoint Intermediate	1	33	August 11, 2024
Search using raw word embedding similarity from BERT Beginners	0	828	October 16, 2021

How to find closest embedding vectors?

Related topics