Hi, I wanted to finetune Electra on my own Squad 2.0-style dataset so I looked at the following documentation to figure out what the data format should be.
It seems in the walkthrough that only answerable questions actually make it into the training/validation datasets. In the JSON files, if a question cannot be answered, the “answers” array is empty. However in the walkthrough, this is how a (context, question, answer) triplet gets added to the data:
for answer in qa['answers']:
contexts.append(context)
questions.append(question)
answers.append(answer)
Because it’s iterating through the “answers” array, if i’m not mistaken, the questions that are unanswerable will never get added to the data.