Distribution shift in SQuADv1?

jubueche · July 20, 2022, 8:12am

Hi,

I am doing finetuning of BERT-base on SQuADv1 and found that if I take the validation data (used during training for evaluation) from the training set, the F1 is consistently lower than the F1 on the test set (denoted ‘validation’ in the dataset). If I take the validation data from the test set (denoted ‘validation’ in SQuADv1) I get higher or the same F1, which is what I would expect.

The only way I could explain this is that there is a distribution shift between the training and validation set of SQuADv1.
Is that the case?

All the best,
Julian

Topic		Replies	Views
Deepset/bert-base-cased-squad2 F1/EM scores Models	0	158	June 6, 2023
How to Add Validation Loss to run_squad.py? Beginners	1	367	November 28, 2020
Deepset's bert cased trained on squad2 question Models	0	145	June 7, 2023
Interpreting train_loss/val_loss Plot Intermediate	3	801	March 24, 2023
Using the same dataset for fine-tuning and training Beginners	2	1531	May 7, 2022

Distribution shift in SQuADv1?

Related topics