BartForConditionalGeneration "logits" shape is wrong/unexpected

wgpubs · November 11, 2020, 8:34pm

Using BartForConditionalGeneration with a batch_size = 2 … all the inputs look right but when I examine the logits the shape is torch.Size([2, 1, 50264])

The inputs are:

x['input_ids'].shape => torch.Size([2, 256])
x['attention_mask'].shape => torch.Size([2, 256])
x['decoder_input_ids'].shape => torch.Size([2, 68])

What may I be doing wrong? Or is there a bug in the model?

Jung · November 11, 2020, 8:52pm

Could you give us the command you produce the logits? Did you just call the model or using model.generate?

Btw, 50264 is Bart vocab_size

wgpubs · November 11, 2020, 8:55pm

It seems the problem occurs when passing decoder_input_ids. Here’s my code:

logits = self.hf_model(x['input_ids'], x['attention_mask'], x['decoder_input_ids'], labels=None, return_dict=True).logits

where self.hf_model is an instance of BartForConditionalGeneration.

returns a tensor with shape torch.Size([2, 1, 50264]) when the expected is torch.Size([2, 68, 50264])

wgpubs · November 11, 2020, 9:33pm

SOLVED

Apparently, if even you are calculating the loss on your own, you have to pass in labels if you pass ing decoder_input_ids.

Ya’ll may want to add this to the docs or update the behavior to not require the labels argument.

Thanks - wg

Jung · November 11, 2020, 9:57pm

Great that you solved it!
If you want to compute loss on your own, perhaps you can use BartModel (does not require labels) instead of BartForConditionalGeneration

=======================
About the original question, to me, the following command gave the correct outputs & shape (I used TF). But it seems we need ‘max_length’ padding

model = TFBartForConditionalGeneration.from_pretrained('facebook/bart-large-cnn')
tokenizer = BartTokenizer.from_pretrained('facebook/bart-large-cnn')

src_texts = ['My friends are cool but they eat too many carbs. I really want them to be healthy, so I buy them vegetable.']
tgt_texts = ['I buy them vegetable.']
x = tokenizer.prepare_seq2seq_batch(src_texts, tgt_texts, return_tensors='tf',padding='max_length')

out = model(x)

Note In my case, padding has to be specified as ‘max_length’ , where in the other case, model calling failed.

Topic		Replies	Views
[Bart] Question for BartModel Output shape Beginners	2	375	July 20, 2020
How to train TFT5ForConditionalGeneration model? 🤗Transformers	5	3338	November 21, 2020
Why is transformer decoder always generating output of same length as gold labels? 🤗Transformers	0	574	September 23, 2022
Manually generate generate_ids using BlipForConditionalGeneration Models	0	138	April 21, 2024
BART learns well, loss decreases, but prediction output is weird 🤗Transformers	2	196	March 3, 2024

BartForConditionalGeneration "logits" shape is wrong/unexpected

Related topics