Issue with Salesforce/blip-image-captioning-large Endpoint: "input_ids or inputs_embeds" Error

Illyap · November 15, 2023, 9:05am

Hello Hugging Face Community,

I am reaching out to seek your expertise regarding an issue I’m facing with the Salesforce/blip-image-captioning-large model via the Inference Endpoints.

Here’s a detailed outline of the problem:

Interface API Functionality: When using the Interface API, the process is smooth. I can send an image URL using json={"inputs": image_url}, and it returns the expected caption without the need to download the image.
Inference Endpoint Issue: However, the same success is not replicated when I switch to using the Inference Endpoints. Regardless of whether I send the image URL directly or download the image and send it, I encounter the following error: {"error": "You have to specify either input_ids or inputs_embeds or encoder_embeds"}. This error persists and prevents the endpoint from generating the expected caption.
Endpoint Testing Failure: To further investigate, I utilized the ‘Test your endpoint!’ feature within the Hugging Face platform by dragging and dropping an image directly. Unfortunately, this also resulted in the same error message.

The crux of the problem seems to be related to the expected request structure for the Inference Endpoint, which differs from the Interface API. The error suggests that the endpoint is expecting specific parameters (input_ids, inputs_embeds, or encoder_embeds) that are not clearly documented or are different from the Interface API’s requirements.

I am looking for guidance on how to resolve this inconsistency:

What is the correct way to structure the request for the Inference Endpoint when using image URLs?
Is there a step I might be overlooking that would account for the difference in behavior between the Interface API and the Inference Endpoint?
Has anyone successfully used the Inference Endpoint with image URLs, and if so, could you share an example request?

Any insights, code snippets, or documentation references that could shed light on this issue would be incredibly valuable.

Thank you in advance for your assistance and support.

Warm regards,
Illya

afromobile · December 12, 2023, 2:30pm

i wish HF documented the expected inputs for the API and inference endpoints. i assumed they were the same schema.

did you resolve this for the Salesforce/blip model? because i’m facing the same issue…

Topic		Replies	Views
Deploying CLIP-Vit as an inference endpoint Inference Endpoints on the Hub	1	450	December 20, 2023
Image to Text API Inference - Input Error Inference Endpoints on the Hub	0	450	October 30, 2023
Inference Endpoints for text embeddings inference not working Inference Endpoints on the Hub	2	202	August 16, 2024
Calling Inference API for image embedding Inference Endpoints on the Hub	0	782	August 28, 2023
Image-To-Text task on Inference Endpoint Inference Endpoints on the Hub	13	2323	October 17, 2023

Issue with Salesforce/blip-image-captioning-large Endpoint: "input_ids or inputs_embeds" Error

Related topics