I have `100K`

known embedding i.e.

```
[emb_1, emb_2, ..., emb_100000]
```

My task is given an embedding(`embedding_new`

) find the closest 10 embedding from the above `100k`

embedding.

The way I am approaching this problem is brute force.

Given a query to find the closest embeddings, I compare `embedding_new`

with `[emb_1, emb_2, ..., emb_100000]`

and get the similarity score.

Then I do quicksort of the similarity score to get the top `10`

closest embedding.

Is there a better way to achieve this?

1 Like

Hey there! I don’t think you can get much faster than this, but I may be wrong. Once you have the list of similarities, you can use `torch.topk`

to obtain the top results of interest.

1 Like

Yes, please have a look at Nearest Neighbour Search algorithms like Faiss, Hnsw, etc. You can also you libraries like annoy.

Annoy Library: GitHub - spotify/annoy: Approximate Nearest Neighbors in C++/Python optimized for memory usage and loading/saving to disk

The above mentioned algorithms use special data structures like graphs/trees to reduce the run time from linear.

Thanks,

Sharan Babu

1 Like