Training loss no drop for MT5ForSequenceClassification

Hi,

I am using mT5 for sequence classification for the first time. Basically I am doing cross-lingual NLI using XNLI. I am using the official MT5ForSequenceClassification method, which attaches a FC network to the decoder’s output. The same code worked well for mBART and T5 (english, french and german only). But after I switched to mT5 (i literally just changed the model name string), I noticed that the training loss never dropped. I have tried various learning rates from 1e-3 to 1e-6. No luck.

Has anyone had a similar experience? Any suggestions ?

Hi,
Did you manage to solve this problem? It seems that I’ve met the same issue. Moreover, I tried to model my classification problem on the same data as generative problem using MT5ForConditionalGeneration and forcing it to generate correct classification labels as text and surprisingly it worked quite good. But straightforward approach doesn’t work, loss doesn’t decrease. I use ‘google/mt5-base’ model.

Regards,
Anatoly