How to perform batch inference on GroundingDino model

tariksetia · June 7, 2024, 5:30pm

frames = [.....List of PIL.Image....]
 inputs = processor(images=frames, text=text, return_tensors="pt").to(device)

with torch.no_grad():
     outputs = model(**inputs)
 print(outputs)

Error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (6) at non-singleton dimension 2

royve · July 25, 2024, 12:04am

I have the same problem… any updates?
Thanks!

qubvel-hf · July 25, 2024, 2:27pm

Hi @royve, thanks for the question, would be nice to have minimal reproducing example and environment

I was able to run a batched inference with the following env and code:

- `transformers` version: 4.44.0.dev0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.4.0+cu118 (True)
- GPU type: NVIDIA A10G

import requests

import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection

model_id = "IDEA-Research/grounding-dino-tiny"
device = "cuda"

processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id).to(device)

image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
images = [image, image]
texts = [
    "a cat. a remote control.",
    "a cat. a remote control. a sofa.",
]

inputs = processor(images=images, text=texts, padding=True, return_tensors="pt").to(device)

with torch.no_grad():
    outputs = model(**inputs)

w, h = image.size
results = processor.post_process_grounded_object_detection(
    outputs,
    inputs.input_ids,
    box_threshold=0.4,
    text_threshold=0.3,
    target_sizes=[(h, w), (h, w)],
)
print(results)

Topic		Replies	Views
Model inference using batch (Encoder-decoder) Models	0	640	September 13, 2023
Object Detection with images of different sizes 🤗Transformers	0	345	May 25, 2023
RuntimeError during inference on Mask2Former model Beginners	6	363	February 8, 2024
What's the best way to speed up inference on a large dataset? Beginners	3	3898	March 13, 2022
Use batching for ViLT predictions Beginners	1	326	February 4, 2022

How to perform batch inference on GroundingDino model

Related topics