I am trying to fine-tune a MT5-base model to test it over the Spanish portion of the XNLI dataset.
My training dataset is the NLI dataset machine translated to Spanish by a MarianMT model, so the quality isn’t the best but I have still managed to get good results while training it with other models shuch as xlm-roberta.
Also, given the size of the NLI dataset I am only training with a 10% of it (with same proportion of labels), which is still 40.000 examples.
The problem I have is that it gets to a point where the loss is stucked and always predicts the same class, so I am looking for some hints about how to make training effective by changing parameters or to see if someone also had the same problems as me.
I have tried with both AdamW and Adafactor and with learning rates ranging from 0.001 to 1e-5 and I always get the same results.
Any help will be appreciated. Thank you very much!