What's the difference between a QA model trained with SQuAD1.0 and SQuAD2.0?

Hi guys,

I’d like to understand if there was any architectural difference between a Question-Answering model trained with the SQuAD1.0 dataset and SQuAD2.0 dataset. From what I understand the model trained with SQuAD2.0 should be capable of understanding if no answer can be provided given a certain question-context pair. Does it do so by giving a lower score to the most-likely answer in the context?
Moreover, how’s the score of an answer exactly calculated ( I’m referring to the score provided by the question-answering pipeline)?

Hi @z051m4, architectural there’s no difference. SQuADv2 has adversarial questions which look like correct questions but have no answers. With squad2 the models are trained to output bos token if no answer is present instead of giving wrong answer. This enables the model to differentiate between answerable and non answerable questions.

1 Like

Thanks @valhalla, so I guess that given the tokenized context in the form
<bos> context-tokens... <eos> the model, after the softmax, will output maximum probability for the start_index and end_index to correspond to the first token (<bos>).
If we define the start index probability as p1 and the end index probability as p2 is it correct to say that the score corresponds to the mean between p1 and p2?