Incorrect logits shape for GIT model

souradipp76 · August 24, 2024, 6:23pm

Hi,
I am trying to use GIT multimodal model (microsoft/git-base-textvqa) for Visual Question Answering. The shape of logits from the output of the forward function is not (batch_size, sequence_length, config.vocab_size) as mentioned in the documentation (GIT). Is this a bug ? Kindly help.

Code to reproduce the issue is given below:

from transformers import AutoProcessor, AutoModelForCausalLM
from huggingface_hub import hf_hub_download
from PIL import Image

processor = AutoProcessor.from_pretrained("microsoft/git-base-textvqa")
model = AutoModelForCausalLM.from_pretrained("microsoft/git-base-textvqa")

file_path = hf_hub_download(repo_id="nielsr/textvqa-sample", filename="bus.png", repo_type="dataset")
image = Image.open(file_path).convert("RGB")

pixel_values = processor(images=image, return_tensors="pt").pixel_values

question = "what does the front of the bus say at the top?"

input_ids = processor(text=question, add_special_tokens=False).input_ids
input_ids = [processor.tokenizer.cls_token_id] + input_ids
input_ids = torch.tensor(input_ids).unsqueeze(0)
print(input_ids.shape) # torch.Size([1, 13])

output = model(pixel_values=pixel_values, input_ids=input_ids)
logits = output.logits
print(logits.shape) # torch.Size([1, 914, 30522])

nielsr · August 26, 2024, 2:01pm

Answered here: Incorrect logits shape for GIT model (microsoft/git-base-textvqa) · Issue #33107 · huggingface/transformers · GitHub

system · August 27, 2024, 2:01am

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
BartForConditionalGeneration "logits" shape is wrong/unexpected 🤗Transformers	4	921	November 11, 2020
Transformer encoding Beginners	0	219	October 27, 2021
Extracting Logits From T5 Output Beginners	5	2080	January 9, 2024
How to train the GIT model on particular datasets Models	2	836	March 12, 2023
How do I fine-tune roberta-large for text classification Beginners	7	3864	December 17, 2021

Incorrect logits shape for GIT model

Related topics