Inference providers: Access to processor data?

fsommers · July 28, 2025, 3:49pm

I love the HF inference providers, but now ran into a question:

Is it possible to get access to the model’s processor output as well via the API?

My specific use-case is with Qwen2.5-VL. I ask the model to perform localization tasks on document images. I ask the model to find bounding box coordinates for page elements. The model generally does very well in this task.

In order to correctly map the localization data returned from the model to my original image sizes, I found that I needed to access the processor’s inputs. That’s because the Qwen processor adjusts image sizes, something that I think is pretty common for many models working with vision encoders. In my case, using the transformers library:

inputs = processor(text=[text], images=images, padding=True, return_tensors="pt")
...
output_ids = model.generate(**inputs, max_new_tokens=max_new_tokens)
generated_ids = [output_ids[len(input_ids):] for input_ids, output_ids in zip(inputs.input_ids, output_ids)]
output_text = processor.batch_decode(generated_ids, skip_special_tokens=True, clean_up_tokenization_spaces=True)
 
# Now I can obtain the input image size:
input_height = inputs['image_grid_thw'][0][1]*14
input_width = inputs['image_grid_thw'][0][2]*14

The model’s localization coordinates will be based on that image size, and this is important to scale those coordinates to some other image dimensions the user actually sees.

How could I solve this using the Inference API?

John6666 · July 29, 2025, 12:50am

If it were a Dedicated Endpoint that you could maintain yourself, you could change the return value by just rewriting handler.py, but since you are using the Inference Provider, that part is a black box.

Therefore, as you suggested, mimicking the processing that is likely being done internally is a relatively lightweight and better approach…
With the following code, the entire model will not be downloaded. It should be possible to use JSON alone.

from PIL import Image
import requests
from transformers import AutoProcessor

url = "https://huggingface.co/datasets/huggingface/documentation-images/resolve/main/diffusers/diffusion-quicktour.png"
orig = Image.open(requests.get(url, stream=True).raw)
prompt = "describe this image"
processor  = AutoProcessor.from_pretrained("Qwen/Qwen2.5-VL-7B-Instruct")

inputs = processor(images=[orig], text=[prompt], padding=True, return_tensors="pt")

grid_h, grid_w = inputs["image_grid_thw"][0][1:].tolist()
proc_h, proc_w = grid_h * 14, grid_w * 14
sx, sy = orig.width / proc_w, orig.height / proc_h
print(inputs["image_grid_thw"], sx, sy) # tensor([[ 1, 18, 18]]) 1.0158730158730158 1.0158730158730158

system · July 29, 2025, 12:50pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
How to have custom output size for inference API Inference Endpoints on the Hub	4	1340	February 16, 2023
Inference API detailed request Beginners	5	2372	September 11, 2020
Backend for the hub models executed by widgets 🤗Hub	1	660	December 8, 2021
How to make single-input inference faster? Create my own pipeline? 🤗Transformers	9	3985	August 26, 2021
PRO Plan and for running huge models on free inference api? Beginners	1	1815	May 15, 2023

Inference providers: Access to processor data?

Related topics