Hi, I wanted to finetune Electra on my own Squad 2.0-style dataset so I looked at the following documentation to figure out what the data format should be.
It seems in the walkthrough that only answerable questions actually make it into the training/validation datasets. In the JSON files, if a question cannot be answered, the “answers” array is empty. However in the walkthrough, this is how a (context, question, answer) triplet gets added to the data:
for answer in qa['answers']: contexts.append(context) questions.append(question) answers.append(answer)
Because it’s iterating through the “answers” array, if i’m not mistaken, the questions that are unanswerable will never get added to the data.