Can't reproduce xlm-roberta-large finetuned result on XNLI

I’m trying to finetune xlm-roberta-large on MNLI English training data and make zero-shot classification on XNLI dataset.
However, I found that xlm-roberta-large is super sensitive to hyper parameters. The reported average accuracy is 80.9, while my model can only achieve 79.74, which is 1% less than the reported accuracy.

I used Adam optimizer with 5e-6 learning rate and the batch size is 16. Any one can suggest better hyperparameters to reproduce the XNLI result of xlm-roberta-large ?

What is the “reported accuracy” you’re trying to reproduce? Accuracy on XNLI? On zero-shot classification? What dataset? Are you trying to reproduce joeddav/xlm-roberta-large-xnli? If so I’m afraid I don’t have the exact hyperparameters I used, but I’ll also note that I trained that before the XNLI train set was released, so it was actually trained on the concatenation of the XNLI dev & test sets and the MNLI train set.

Hi joeddav, thanks for your reply! I have tried your model, and its performance is great!

However, I’m trying to reproduce the cross-lingual transfer result showed on the original paper’s Table 1, which finetuned multilingual model on English training set and test on XNLI test set. Therefore, I think my model shouldn’t access XNLI dev&test sets during training.

The below issue is quite similar to my question.