Flan-T5 finetuning, predictions too short?

hi guys, im trying to finetune a flan-t5-base modell, with following guide:

its working, but im not sure about the decoded_preds:

def compute_metrics(eval_pred):
     predictions, labels = eval_pred
     labels[labels<0] = 0 

     decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
     decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

      # Compute ROUGE scores
      result = metric.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)

      # Add mean generated length to metrics
      prediction_lens = []
      for pred in predictions:
          pred_len = len(pred)
          print("pred_len: ", pred_len)
          prediction_lens.append(pred_len)
  
      result["gen_len"] = np.mean(prediction_lens)
      print("result[gen_len]: ", result["gen_len"])

      return {k: round(v, 4) for k, v in result.items()}

because the output during the training seems too short, its finished too early when you compare with decoded labels.:
decoded_preds: Chatbot: 2 Hallo Frau Test, vielen Dank für die Anfrage bez
decoded_labels: Chatbot: 4 Hallo Frau Muster, vielen Dank für die Anfrage bezgl. Notifikation nderung E-Mailadresse>. Wir konnten ihr Anliegen wie folgt lösen: Die E-Mailadresse wurde durch das Sekretariat im System gelöscht. Freundliche Grüsse Max Mustermann>.
prediction_lens: 19
result[gen_len]: 19.0

im seraching the issue since days, has anyone a tipp here?
thanks a lot…