Image-To-Text task on Inference Endpoint

I would like to deploy an Image-To-Text model on an Inference Endpoint. For example, the following model.

When I try to deploy this model, the task is set to custom, but it fails to start because there is no

The error log indicates that there is a task called image-to-text , but this one cannot be selected from the Inference Endpoint configuration screen.

KeyError: “Unknown task custom, available tasks are [‘audio-classification’, ‘automatic-speech-recognition’, ‘conversational’, ‘depth- estimation’, ‘document-question-answering’, ‘feature-extraction’, ‘fill-mask’, ‘image-classification’, ‘image-segmentation’, ‘image-to- text’, ‘ner’, ‘object-detection’, ‘question-answering’, ‘sentiment-analysis’, ‘summarization’, ‘table-question-answering’, ‘text- classification’, ‘text-generation’, ‘text2text-generation’, ‘token-classification’, ‘translation’, ‘video-classification’, ‘visual- question-answering’, ‘vqa’, ‘zero-shot-audio-classification’, ‘zero-shot-classification’, ‘zero-shot-image-classification’, ‘zero-shot- object-detection’, ‘translation_XX_to_YY’]”

Is it not possible to set image-to-text to task in my Inference Endpoint? And if the only way is to set the task to custom, is there a template for image-to-text somewhere?


Hi, wondering if there was any update to this? Looking for the same problem

Working on adding it as zero-code deployment. Will keep you posted.


Is there any alternate solution to this?

Any update?

I have used this model You can use its own API url.

I am trying to do the same. Is there any help here? If i put it to Image-Classification its asking for input_ids, which i am not able to provide neither ui nor through my request.

Any updates on this? I need to do the same thing and with no success.

The problem when I try to generate an inference endpoint, I get this warning:
Warning: deploying this model may fail because a


file was not found in the repository. Try selecting a different model or creating a custom handler.

Hi @philschmid,
any updates on this?

This is how you do it through custom endpoints:

However, I think the optimum method now is do follow along what
@philschmid has shared today for Sagemaker implementation

@thorikawa also you can use this one is for the blip model:

@ckandemir You are a legend!
Thank you so much.
My struggle was that I once I had cloned the repo (the blip model repo), I wasn’t able to generate the correct dependencies like the pythorch_bin and the weights needed for me to be able to then attach my custom and then create the inference endpoint.

Do you mind sharing how you were able to do the whole process ( sorry, I am a newby when it comes to this stuff, if you can’t tell… lol).

For the, I don’t want the payload to be a list of images, I just want to be able to attach 1 image to be processed…

Anyway, you’ve been a lot of help already, I would totally understand if this is too much to ask :slight_smile:

@pdichone basically when you are creating a custom endpoint handler your are reconfiguring the forward method to align with the generate function of the model you are using behind the endpoint, and thus tweaking the payload accordingly to the underlying mode’s accepted input type. @philschmid please correct me if I am wrong

I would recommend to follow this documentation here to build couple random endpoints to get a hold on it. Then you can easily spot whats going on in the logs when you deploy your endpoints.

You can tweak this code here to serve your use case