frames = [.....List of PIL.Image....]
inputs = processor(images=frames, text=text, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
print(outputs)
Error:
RuntimeError: The size of tensor a (3) must match the size of tensor b (6) at non-singleton dimension 2
1 Like
royve
2
I have the same problem… any updates?
Thanks!
Hi @royve, thanks for the question, would be nice to have minimal reproducing example and environment
I was able to run a batched inference with the following env and code:
- `transformers` version: 4.44.0.dev0
- Platform: Linux-6.5.0-1020-aws-x86_64-with-glibc2.35
- Python version: 3.10.12
- PyTorch version (GPU?): 2.4.0+cu118 (True)
- GPU type: NVIDIA A10G
import requests
import torch
from PIL import Image
from transformers import AutoProcessor, AutoModelForZeroShotObjectDetection
model_id = "IDEA-Research/grounding-dino-tiny"
device = "cuda"
processor = AutoProcessor.from_pretrained(model_id)
model = AutoModelForZeroShotObjectDetection.from_pretrained(model_id).to(device)
image_url = "http://images.cocodataset.org/val2017/000000039769.jpg"
image = Image.open(requests.get(image_url, stream=True).raw)
images = [image, image]
texts = [
"a cat. a remote control.",
"a cat. a remote control. a sofa.",
]
inputs = processor(images=images, text=texts, padding=True, return_tensors="pt").to(device)
with torch.no_grad():
outputs = model(**inputs)
w, h = image.size
results = processor.post_process_grounded_object_detection(
outputs,
inputs.input_ids,
box_threshold=0.4,
text_threshold=0.3,
target_sizes=[(h, w), (h, w)],
)
print(results)
1 Like