Why is transformer decoder always generating output of same length as gold labels?

AndreaSottana · September 23, 2022, 3:52pm

I am generating some summaries using a fine-tuned BART model, and I’ve noticed something strange. If I feed the labels to the model, it will always generate summaries of the same length of the label, whereas if I do not pass the labels to the model, it generates outputs of length 1024 (max BART seq length). This is unexpected, so I’m trying to understand if there is any problem / bug with the reproducible example below

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer
model=AutoModelForSeq2SeqLM.from_pretrained('facebook/bart-large-cnn')
tokenizer=AutoTokenizer.from_pretrained('facebook/bart-large-cnn')


sentence_to_summarize = ['This is a text to summarise. I just went for a walk in the park and saw very large crowds gathering to watch an impromptu football match']
encoded_dict = tokenizer.batch_encode_plus(sentence_to_summarize, return_tensors='pt', max_length=1024, padding='max_length')
input_ids = encoded_dict['input_ids']
attention_mask = encoded_dict['attention_mask']
label = tokenizer.encode('I went to the park', return_tensors='pt')

Notice the following two cases. Case 1:

output = model(input_ids=input_ids, attention_mask=attention_mask)
print(output['logits'].shape)

shape printed is torch.Size([1, 1024, 50264])

Case 2

output = model(input_ids=input_ids, attention_mask=attention_mask, labels=label)
print(output['logits'].shape)

shape printed is torch.Size([1, 7, 50264]) where 7 is the length of the label 'I went to the park' (including start and end tokens).
Ideally the summarization model would learn when to generate the EOS token, but this should not always lead to summaries of identical length of the gold output (i.e. the label). Why is the label length influencing the model output in this way?

I would expect the only difference between cases 1 and 2 being that in the second case the output also contains the loss value, but I wouldn’t expect this to influence the logits in any way

Topic		Replies	Views
Scores in generate() Beginners	6	9604	May 26, 2023
Train Bart for Conditional Generation (e.g. Summarization) Models	14	16868	November 22, 2023
Inconsistent Model/Pipeline Behavior using Automodel/Pipeline/BartForConditionalGeneration 🤗Transformers	3	868	February 16, 2021
How to increase the length of the summary in Bart_large_cnn model used via transformers.Auto_Model_frompretrained? Beginners	1	980	November 15, 2021
Is there a way to return the "decoder_input_ids" from "tokenizer.prepare_seq2seq_batch"? 🤗Transformers	5	3280	December 29, 2020

Why is transformer decoder always generating output of same length as gold labels?

Related topics