XLMR-large not converging on Paws-X paraphrase dataset but mbert does

gowthamr · February 14, 2021, 7:07pm

Hey everyone,

I tried training Mbert and Xlmr-large on the paws-x English paraphrase detection dataset and looks like Xlmr-large is not converging, while Mbert does. I’ve tried tweaking the hyperparameters for the Xlmr-large but that doesn’t seem to help as well.

Attaching train stats for both models below(Note: I’m evaluating every 100 steps)

I’ve changed the hf run_glue colab example to reproduce this behavior here. For hyperparameters, I’m following the hyperparameters used in xtreme paper where they report better results for xlmr-large compared to mbert on paws-x

Would appreciate it if someone took a quick look and have any suggestions.

Thanks

gowthamr · May 3, 2021, 9:02pm

I revisited this with the latest hf and tried it with fp16 and it seems to work now.

Also had a similar issue with roberta-large models on xnli and paws. Tried with fp16 and fp32 and every time, one of them worked.

Topic		Replies	Views
Can't reproduce xlm-roberta-large finetuned result on XNLI 🤗Transformers	2	1918	March 10, 2021
Does anyone else observer RoBERTa fine-tuning instability? 🤗Transformers	8	3114	April 20, 2023
Train loss is not decreasing on siamese model based on xlm-roberta Intermediate	1	568	February 22, 2024
Bug: Finetune XLM-RoBERTa-large on XNLI get 0.33 in accuracy while XLM-RoBERTa-base works fine 🤗Transformers	0	343	March 23, 2022
Cannot replicate xlm-roberta-large-xnli Results Models	0	496	September 2, 2021

XLMR-large not converging on Paws-X paraphrase dataset but mbert does

Related topics