How to finetune mT5

I am using mT5 for the task of summarization on a language other than English. But even after training for 30 epochs, the generations are very bad with rouge 1 as 31.5, whereas mBART gives a rouge 1 of 43.1 after training only for 11 epochs.
I wanted to know if mT5’s performance is expected to be like this compared to mBART, or am I doing something wrong.
Appreciate any help. Thank you :slight_smile:

1 Like