Image/tag retrieval system

Hi, I would like to build a image/tag retrieval system. Given an image I could check in the latent space what tags would be appropriate for the image and the other way around, given tags retrieve the images.

I was thinking about using CLIP having as data an image and a tag/keyword. This way I don’t have any semantic information which I don’t want. Would that still work or do I need some other model or maybe easier technique (since semantic information would not exist)?

Thanks!

1 Like