Deploying CLIP-Vit as an inference endpoint

@radames, i wanna deploy sentence-transformers/clip-ViT-B-32 · Hugging Face as inference endpoint. But can’t figure out to make it accept image. I know that locally, this model can accept both image and text, but how do i make it on HF deployed endpoint ?

My goal is to generate embeddings from an image…

hi @tech-untukmu-ai ,

Considering this model need to encode an image and text, you might want to deploy a custom handler to handle the Image and Text payload, please follow this guide here