How to find the closest matching sentence using sentence transformer and faiss?

I am trying to do semantic search with sentence transformer and faiss.

I am able to generate emebdding from corpus and perform query with the query xq.
But what are t

from sentence_transformers import SentenceTransformer, util
model = SentenceTransformer("all-MiniLM-L6-v2")

def get_embeddings(code_snippets: str):
    return model.encode(code_snippets)

def build_vector_database(atlas_datapoints):
    dimension = 768  # dimensions of each vector

    corpus = ["tom loves candy",
                    "this is a test"
                    "hello world"
                    "jerry loves programming"]

    emddings = get_embeddings(corpus)
    print(emddings.shape)

    d = emddings.shape[1]
    index = faiss.IndexFlatL2(d)
    print(index.is_trained)

    index.add(code_snippet_emddings)
    print(index.ntotal)

    k = 2
    xq = model.encode(["jerry loves candy"])

    D, I = index.search(xq, k)  # search
    print(I)
    print(D)

This code returns

[[0 1]]
[[1.3480902 1.6274161]]

But I cant find which sentence xq is matching with and not the matching scores only.

How can I find the top-N matching string from the corpus?

I corresponds to the indices of the closest documents in your corpus:

matched_sentences = [corpus[i] for i in I[0]]

Here the closest match is the first document of your corpus: “tom loves candy”

1 Like