Dataset confusion for distilroberta trained on squad2

aishgupt · November 17, 2021, 11:55am

Hi,

As per twmkn9/distilroberta-base-squad2 · Hugging Face, the exact and f1 scores achieved while eval are given on total for 6078 instances. The official dataset for SQUAD2 has 11873 instances (refer to official website).

I searched for this subset for 6078 instances on google and got this - squad/data at master · elgeish/squad · GitHub - this has 6078 instances.
But even with this subset, the exact and f1 scores I’m getting are around 58 and 62 only using the same run_squad.py script – while the reported numbers are 70 and 74.

Need help if the model provided is properly trained or not and need clarity on the dev dataset being used.

Topic		Replies	Views
Deepset/bert-base-cased-squad2 F1/EM scores Models	0	158	June 6, 2023
How to reproduce the performance of bert-large-uncased-whole-word-masking-finetuned-squad? Intermediate	0	303	July 25, 2021
Fine tunning QA model in SQUAD 2 dataset with more than one answer Intermediate	2	881	November 6, 2024
The datasets num is not equal 🤗Datasets	0	6	May 15, 2025
Model did not return a loss --- but why? 🤗Transformers	0	745	April 27, 2023

Dataset confusion for distilroberta trained on squad2

Related topics