Hi,
I am using mT5 for sequence classification for the first time. Basically I am doing cross-lingual NLI using XNLI. I am using the official MT5ForSequenceClassification
method, which attaches a FC network to the decoderâs output. The same code worked well for mBART and T5 (english, french and german only). But after I switched to mT5 (i literally just changed the model name string), I noticed that the training loss never dropped. I have tried various learning rates from 1e-3 to 1e-6. No luck.
Has anyone had a similar experience? Any suggestions ?
1 Like
Hi,
Did you manage to solve this problem? It seems that Iâve met the same issue. Moreover, I tried to model my classification problem on the same data as generative problem using MT5ForConditionalGeneration and forcing it to generate correct classification labels as text and surprisingly it worked quite good. But straightforward approach doesnât work, loss doesnât decrease. I use âgoogle/mt5-baseâ model.
Regards,
Anatoly
1 Like
I think so. I donât remember what I did (I tried a bunch of things not related to this issue) but the problem is gone. I probably solved the problem by using a larger model and did full finetuning instead of LoRA.
1 Like