Calculating Rouge metric for fine tunning Pegasus

I’ve been fine-tunning pegasus using the trainer class of huggingface.
I tried to implement the rouge metric using this method but every time the following error happens

def compute_metrics(eval_pred):
   predictions, labels = eval_pred
   decoded_preds = tokenizer.batch_decode(predictions, skip_special_tokens=True)
   # Replace -100 in the labels as we can't decode them.
   labels = np.where(labels != -100, labels, tokenizer.pad_token_id)
   decoded_labels = tokenizer.batch_decode(labels, skip_special_tokens=True)

   # Rouge expects a newline after each sentence
   decoded_preds = ["\n".join(nltk.sent_tokenize(pred.strip())) for pred in decoded_preds]
   decoded_labels = ["\n".join(nltk.sent_tokenize(label.strip())) for label in decoded_labels]
   rouge = load_metric('rouge')
   result = rouge.compute(predictions=decoded_preds, references=decoded_labels, use_stemmer=True)
   # Extract a few results
   result = {key: value.mid.fmeasure * 100 for key, value in result.items()}

   # Add mean generated length
   prediction_lens = [np.count_nonzero(pred != tokenizer.pad_token_id) for pred in predictions]
   result["gen_len"] = np.mean(prediction_lens)

   return {k: round(v, 4) for k, v in result.items()}

TypeError: int() argument must be a string, a bytes-like object or a number, not ‘list’

We are using the PegasusForConditionalGeneration transformers class
and according to this post the predictions from the eval_pred parameter is a Tuple. Which I can’t use to get the rouge scores.

We are using this fine tuning script as a base.

Is there anyway around this ?