How do I use Text-Image to Text models with Huggingface Inference?

perceptron-743 · October 12, 2024, 7:14am

Hi all, I want to use phi-3.5-text-img-text for a multi modal task which takes the image of a page and converts the contents of the page into html tags and if there is an image found on the page, it will convert that image into text and add ‘illustration’ tag beside it.

How do I use the huggingface inference api for phi3.5instruct so that it generates the output.

John6666 · October 12, 2024, 1:24pm

It’s in the manual, but it’s a newly implemented pipeline, so I don’t know if it really works.

not-lain · October 12, 2024, 3:12pm

Hi @perceptron-743
the model microsoft/Phi-3.5-vision-instruct contains custom-code.
By default, inference api is disabled for such models, this is done for security reasons since you can’t know what code the model is running in the back.
another way to see if the model has an inference api disabled is to go to the model page and check on the right

if the logo has a broken ligthening logo it means the inference api is disabled for that model.
To filter the hub by models that have inference api enabled you can checkout the warm and cold ones as shown in the picture below

John6666 · October 12, 2024, 10:56pm

You’re right, but isn’t the server-side behavior a bit buggy?
When Inference is turned off, the input field itself on the GUI page doesn’t appear, which is normal behavior for other inferences.
Moreover, the author has not explicitly turned off Inference, and he appears to have done the user-enabled settings to turn it on.

---
license: mit
license_link: https://huggingface.co/microsoft/Phi-3.5-vision-instruct/resolve/main/LICENSE
language:
- multilingual
pipeline_tag: image-text-to-text
tags:
- nlp
- code
- vision
inference:
  parameters:
    temperature: 0.7
widget:
- messages:
  - role: user
    content: <|image_1|>Can you describe what you see in the image?
library_name: transformers
---

Topic		Replies	Views
How do I use text-to-image huggingface models as an API for my website? Beginners	1	6103	April 20, 2023
Inference provider for captioning (image2text model) Beginners	3	22	June 16, 2025
Image to Text API Inference - Input Error Inference Endpoints on the Hub	0	451	October 30, 2023
How to make an inference for HuggingFaceModel of type 'image-to-text' Amazon SageMaker	0	502	January 27, 2024
Calling Inference API for image embedding Inference Endpoints on the Hub	0	786	August 28, 2023

How do I use Text-Image to Text models with Huggingface Inference?

Related topics