Caching image prototype embeddings for image-guided object detection using OWL-ViT

Did you happen to get a solution or alternative for this? I am trying to do something similar.

1 Like