Dear hugging face users,
I’m trying to implement batch images inference on Owl-Vit. At the moment, I’m working on a set of 11 images, with 72 labels and batch_size=2. I get information how to implement batch size from here:
with the only different I’m using “google/owlvit-large-patch14” model instead of “google/owlvit-large-patch32”. The code works well for first two images, but on third, I get:
RuntimeError: shape '[4, 37, 768]' is invalid for input of size 115200
with torch.no_grad(): outputs = model(**inputs)
I don’t understand what such shapes are. Are referring to image in process or the underlying net? Maybe I made some mistakes? I’m using too much labels? Thanks.