[Question Answering] Why SQuaD training set only contrains one possible answer in each sample

Dylan1999 · October 14, 2022, 7:26pm

Hi,

I am going through the huggingface lectures: Question answering.

When it was introducing the SQuad V2 dataset, In train-v2.0.json , there is only one answer for the question. While in dev-v2.0.json and hidden test-v2.0.json , there are several answers for a given question.

I did not understand why in the training set, there must be only one answer?

If there are multiple spans that share the exact same text as the correct answer, for example, if the correct answer is “20th century” and it is shown 3 times in the context. And in the training set, only the first place of “20th century” was annotated as “positive”.

I think this will make the model to be confused because even if it predicts the 2nd “20th century” in the context is positive, the loss function will tell it that is wrong.

Am I right?

Topic		Replies	Views
Fine tunning QA model in SQUAD 2 dataset with more than one answer Intermediate	2	881	November 6, 2024
What's the difference between a QA model trained with SQuAD1.0 and SQuAD2.0? 🤗Transformers	2	905	July 15, 2020
Error in Question Answering on SQUAD Beginners	0	141	July 1, 2023
Based on HF documentation, unnaswerable questions from Squad 2.0 don't make it into train/val data Intermediate	4	975	December 3, 2020
How to understand the answer_start parameter of Squad dataset for training BERT-QA model + practical implications for creating custom dataset? Intermediate	1	1004	September 1, 2023

[Question Answering] Why SQuaD training set only contrains one possible answer in each sample

Related topics