Improving semantic search with zero shot image classification

Hi all, I’m just getting started with semantic search and zero shot image classification. This is my first foray into data science, so please bear with me.

I am hoping that someone can help me get better results from my semantic search.

This is what I have done so far:

  • Downloaded a sample set of 4000 images from unsplash
  • Created an opensearch index to store the vectors as follows:

"settings": {
    "index": {
      "knn": true,
      "knn.algo_param.ef_search": 512
  "mappings": {
    "properties": {
        "image_vector": {
          "type": "knn_vector",
          "dimension": 512,
          "method": {
            "name": "hnsw",
            "space_type": "cosinesimil",
            "engine": "nmslib",
            "parameters": {
              "ef_construction": 512,
              "m": 16
  • Processed each image using ViT-B/32, and stored in OpenSearch
model, preprocess = clip.load("ViT-B/32", device=cpu)

def create_image_embedding(image_path):
image = preprocess(
with torch.no_grad():
    image_features = model.encode_image(image)
return image_features.tolist()[0]
  • Create a text embedding of the search term using the same model
def create_text_embedding(text):
text = clip.tokenize([text]).to(device)
with torch.no_grad():
    text_features = model.encode_text(text)
return text_features.tolist()[0]
  • Query the opensearch database and retrieve the results
query = {
    "size": 100,
    "_source": {"excludes": ["image_vector"]},
    "query": {
        "knn": {
            "image_vector": {
                "vector": text_embedding,
                "k": 100,

So, as mentioned, the results are somewhat good, but I get some not very accurate or strange results relatively high in the results. Granted, the dataset is only 4000 images, so that may be a limiting factor.

Are there any other knobs that I can tweak to make the search accuracy better?

Thanks for the help!