Reproduce BERT and RoBERTa

wyu1 · June 23, 2021, 10:54pm

Hi to all,

I tried to use the running run mlm.py to reproduce the result of the bert-base-uncased version. However, I found my reproduced results are always lower than the one reported in this website provided by the Huggingface team.

Task	Metric	Reported by Huggingface	Our reproduced result
CoLA	Matthew’s corr	56.53	47.92
SST-2	Accuracy	92.32	87.56
MRPC	F1/Accuracy	88.85/84.07	82.03/80.97
STS-B	Person/Spearman corr.	88.64/88.48	82.45/82.76
QQP	Accuracy/F1	90.71/87.49	88.23/86.12
MNLI	Matched acc./Mismatched acc.	83.91/84.10	82.34/83.01
QNLI	Accuracy	90.66	85.45
RTE	Accuracy	65.70	56.95

I think there must be some problems with my experiment. I ran my experiment by using:

(1) I used the code in this file without any change.

(2) I loaded the datasets of bookcorpus and wiki directly from dataset library; the text is chunked into 512 tokens.

(3) I set the batch size as 256 and ran 1M steps; and batch size as 8K and ran 50K steps. Both results are worse than the reported numbers.

I really apprecitate if you could provide me a script that I can use to reproduce BERT or RoBERTa. Thank you very much!

JackBAI · July 24, 2023, 10:20am

It’s very usual to have lower scores than the official team, probably due to randomness. But I’m still interested in why this is the case. Do you have any updates on this? Looking forward to it! @wyu1

Topic		Replies	Views
Reproduce RoBERTa Using Huggingface Transformers 🤗Transformers	0	241	July 28, 2023
How to reproduce the performance of bert-large-uncased-whole-word-masking-finetuned-squad? Intermediate	0	303	July 25, 2021
Replication of the performance of RoBERTa on the COPA task Models	0	543	December 19, 2022
I'm making ROBERTA dumber, and I don't know why Beginners	1	341	March 8, 2021
Ensuring Consistency in Results: A Focus on Reproducibility BERT 🤗Transformers	2	88	October 3, 2024

Reproduce BERT and RoBERTa

Related topics