Can't reproduce xlm-roberta-large finetuned result on XNLI

spencer97 · March 9, 2021, 5:43pm

I’m trying to finetune xlm-roberta-large on MNLI English training data and make zero-shot classification on XNLI dataset.
However, I found that xlm-roberta-large is super sensitive to hyper parameters. The reported average accuracy is 80.9, while my model can only achieve 79.74, which is 1% less than the reported accuracy.

I used Adam optimizer with 5e-6 learning rate and the batch size is 16. Any one can suggest better hyperparameters to reproduce the XNLI result of xlm-roberta-large ?

joeddav · March 9, 2021, 5:47pm

What is the “reported accuracy” you’re trying to reproduce? Accuracy on XNLI? On zero-shot classification? What dataset? Are you trying to reproduce joeddav/xlm-roberta-large-xnli? If so I’m afraid I don’t have the exact hyperparameters I used, but I’ll also note that I trained that before the XNLI train set was released, so it was actually trained on the concatenation of the XNLI dev & test sets and the MNLI train set.

spencer97 · March 10, 2021, 6:04am

Hi joeddav, thanks for your reply! I have tried your model, and its performance is great!

However, I’m trying to reproduce the cross-lingual transfer result showed on the original paper’s Table 1, which finetuned multilingual model on English training set and test on XNLI test set. Therefore, I think my model shouldn’t access XNLI dev&test sets during training.

The below issue is quite similar to my question.

github.com/facebookresearch/fairseq

XNLI Results Reproduction of XLM-R

opened 09:35AM - 24 Apr 20 UTC

MGithubGA

question stale

Thanks for the impressive work of XLM-R. Recently I found that the results o…n XNLI are updated: the avg-acc of XLM-R_base is increased from 74.6 to 76.1. I can obtain the best results 74.6 by finetuning 5 epochs with lr=1e-5 with batch size of 32, weight decay 0.1, and 10% warm up. I have also tried the suggestion by @kartikayk from [Issue-1367](https://github.com/pytorch/fairseq/issues/1367), but it seems doesn't work for me. I learn the model with batch size of 32 and 4-step grad accumulation, 5K steps for each epoch, and fixed lr=5e-6 or lr=5e-6 with a linear decay of lr and 10% warm up. However, I cannot obtain the results of 76.1. Maybe I miss some important details. Could you provide me more details or your finetuning code? Thanks.

Topic		Replies	Views
Bug: Finetune XLM-RoBERTa-large on XNLI get 0.33 in accuracy while XLM-RoBERTa-base works fine 🤗Transformers	0	342	March 23, 2022
Fine-Tune Xlm-roberta-large-xnli 🤗Transformers	1	1918	December 28, 2021
Fine-tuning MT5 on XNLI Beginners	1	1772	October 16, 2021
Cannot replicate xlm-roberta-large-xnli Results Models	0	496	September 2, 2021
Does anyone else observer RoBERTa fine-tuning instability? 🤗Transformers	8	3114	April 20, 2023

Can't reproduce xlm-roberta-large finetuned result on XNLI

Related topics