Using perplexity as metric during training

I have followed the tutorial AutoModelForSequenceClassification for fine-tuning using accuracy and other metrics and all went fine.

Now I am following the tutorial AutoModelForMaskedLM for domain adaptation and everything is good.

However, I am struggling to use a metric in the same way that I did before, so that it is reported after each epoch.

Of course, the following works, but it is only reported before and after training:

eval_results = trainer.evaluate()
print(f">>> Perplexity: {math.exp(eval_results['eval_loss']):.2f}")

However, the following training argument does not:

#metric_for_best_model='perplexity',

I have also tried the following training argument:

#compute_metrics=compute_metrics,

The question is, how do I define the function?

#def compute_metrics(eval_results):
#	return math.exp(eval_results['eval_loss'])

That does not work either.

Best,

Ed