Training loss no drop for MT5ForSequenceClassification

forrestbao · February 17, 2024, 5:46am

Hi,

I am using mT5 for sequence classification for the first time. Basically I am doing cross-lingual NLI using XNLI. I am using the official MT5ForSequenceClassification method, which attaches a FC network to the decoder’s output. The same code worked well for mBART and T5 (english, french and german only). But after I switched to mT5 (i literally just changed the model name string), I noticed that the training loss never dropped. I have tried various learning rates from 1e-3 to 1e-6. No luck.

Has anyone had a similar experience? Any suggestions ?

astarostin1983 · June 23, 2024, 12:08am

Hi,
Did you manage to solve this problem? It seems that I’ve met the same issue. Moreover, I tried to model my classification problem on the same data as generative problem using MT5ForConditionalGeneration and forcing it to generate correct classification labels as text and surprisingly it worked quite good. But straightforward approach doesn’t work, loss doesn’t decrease. I use ‘google/mt5-base’ model.

Regards,
Anatoly

forrestbao · January 1, 2025, 3:55am

I think so. I don’t remember what I did (I tried a bunch of things not related to this issue) but the problem is gone. I probably solved the problem by using a larger model and did full finetuning instead of LoRA.

Topic		Replies	Views
Fine-tuning MT5 on XNLI Beginners	1	1772	October 16, 2021
Saving fine-tuned MT5ForSequenceClassification 🤗Transformers	5	388	January 24, 2024
Finetuning T5 on translation task 🤗Transformers	0	489	September 10, 2021
How to clone model ForSequenceClassification 🤗Transformers	3	1088	January 8, 2024
Audio Course: Unit 6 Unable lower loss during training of Speech T5 Course	0	19	December 23, 2024

Training loss no drop for MT5ForSequenceClassification

Related topics