How to reproduce the performance of bert-large-uncased-whole-word-masking-finetuned-squad?

bengul · July 25, 2021, 10:22am

I have been trying to reproduce the results of the model bert-large-uncased-whole-word-masking-finetuned-squad · Hugging Face. The model page records a result of f1 = 93.15, exact_match = 86.91. But I am getting “f1”: 43.75 and “exact”: 39.02. I have been scratching my head for a few days now to figure out why there is such a big difference in performance. I am attaching the colab notebook here: Google Colaboratory

What am I missing? Any help is highly appreciated.

Topic		Replies	Views
Reproduce BERT and RoBERTa 🤗Transformers	1	974	July 24, 2023
Different accuracy Beginners	0	152	August 17, 2023
Bert-large strange performance in document classification (auc~0.5) 🤗Transformers	0	413	November 4, 2021
Bert-base-uncased performs badly in next sentence prediction (bookcorpus) 🤗Transformers	0	339	June 2, 2023
Ensuring Consistency in Results: A Focus on Reproducibility BERT 🤗Transformers	2	87	October 3, 2024

How to reproduce the performance of bert-large-uncased-whole-word-masking-finetuned-squad?

Related topics