Why is transformer decoder always generating output of same length as gold labels?

I am generating some summaries using a fine-tuned BART model, and I’ve noticed something strange. If I feed the labels to the model, it will always generate summaries of the same length of the label, whereas if I do not pass the labels to the model, it generates outputs of length 1024 (max BART seq length). This is unexpected, so I’m trying to understand if there is any problem / bug with the reproducible example below

from transformers import AutoModelForSeq2SeqLM, AutoTokenizer

sentence_to_summarize = ['This is a text to summarise. I just went for a walk in the park and saw very large crowds gathering to watch an impromptu football match']
encoded_dict = tokenizer.batch_encode_plus(sentence_to_summarize, return_tensors='pt', max_length=1024, padding='max_length')
input_ids = encoded_dict['input_ids']
attention_mask = encoded_dict['attention_mask']
label = tokenizer.encode('I went to the park', return_tensors='pt')

Notice the following two cases. Case 1:

output = model(input_ids=input_ids, attention_mask=attention_mask)

shape printed is torch.Size([1, 1024, 50264])

Case 2

output = model(input_ids=input_ids, attention_mask=attention_mask, labels=label)

shape printed is torch.Size([1, 7, 50264]) where 7 is the length of the label 'I went to the park' (including start and end tokens).
Ideally the summarization model would learn when to generate the EOS token, but this should not always lead to summaries of identical length of the gold output (i.e. the label). Why is the label length influencing the model output in this way?

I would expect the only difference between cases 1 and 2 being that in the second case the output also contains the loss value, but I wouldn’t expect this to influence the logits in any way