Using Pegasus Model for Transfer Learning is generating garbage summaries

Hi ,
I am using the PegasusForConditionalGeneration from transformers library for transfer learning and generate summaries of chats based on samsum dataset. But, the output generated is repetitive and not related at all to the context.

Also, getting loss as nan. Can someone help me regarding what went wrong?

Please note that the same code works perfectly fine for T5ForConditionalGeneration.
Is there any difference in the implementation method?
Should I try with other Loss Metrics or Optimizer?

Could you post the command you are using for fine-tuning ?

I’m using a Python Code for the same. A snippet of which I’m sharing below -

       for _ in range(epochs):
            self.model.train()
            train_loss = 0
            for idx, data in tqdm(enumerate(self.train_loader)):
                self.optimizer.zero_grad()
                output = self.model(input_ids = data["input_ids"], attention_mask = data["attention_mask"], lm_labels = data["lm_labels"])
                loss, prediction_scores = output[:2]  
                train_loss += loss.item()
                loss.backward()
                self.optimizer.step()
                if((idx % 1000) == 0):
                    print("loss: ", loss.item(), " train_loss: ", train_loss/(idx+1))

As the name of each variable suggests, I’ve PegasusForConditionalGeneration in the variable self.model, Adam Optimizer in self.optimizer and self.train_loader is of type DataLoader.