Based on HF documentation, unnaswerable questions from Squad 2.0 don't make it into train/val data

melody-ju · November 16, 2020, 3:07pm

Hi, I wanted to finetune Electra on my own Squad 2.0-style dataset so I looked at the following documentation to figure out what the data format should be.

It seems in the walkthrough that only answerable questions actually make it into the training/validation datasets. In the JSON files, if a question cannot be answered, the “answers” array is empty. However in the walkthrough, this is how a (context, question, answer) triplet gets added to the data:

    for answer in qa['answers']:
        contexts.append(context)
        questions.append(question)
        answers.append(answer)

Because it’s iterating through the “answers” array, if i’m not mistaken, the questions that are unanswerable will never get added to the data.

melody-ju · November 25, 2020, 1:07pm

@valhalla @sgugger Not sure if you two are the right people to tag but thought I’d start somewhere!

melody-ju · November 25, 2020, 1:10pm

Asking because I’m not sure how to feed the model unanswerable questions in training, since the example in the doc just seems to ignore them - and it’s a bit part of squad 2

Jung · December 3, 2020, 7:01am

Hi melody, I believe you are right.

Conceptually I think we need to set target logits (both start and end) to be all zeros for all unanswerable questions. Also need to set/finetune threshold based on predicted logits whether the question is unanswerable or not (Need to modify the official example a bit)

sgugger · December 3, 2020, 2:01pm

Note that this is not the “official” example, but a simplified version for a tutorial. The official example is in examples/question-answering (will be further simplified very soon as I’m working on a PR) and does take into account the unanswerable questions.

Topic		Replies	Views
Invalid Label value Error coming during Squad2.0 fine tuning Beginners	0	425	December 14, 2020
Fine tunning QA model in SQUAD 2 dataset with more than one answer Intermediate	2	880	November 6, 2024
Impossible questions when finetuning QA models 🤗Transformers	0	290	November 19, 2021
[Question Answering] Why SQuaD training set only contrains one possible answer in each sample 🤗Datasets	0	550	October 14, 2022
Can Q&A model say "I don't know" Intermediate	8	2440	September 14, 2022

Based on HF documentation, unnaswerable questions from Squad 2.0 don't make it into train/val data

Related topics