Shape of squad data for Question answering

pantograph · April 15, 2023, 12:26pm

Before this looks too naive even for a beginner I will mention that i read all the questions possible around the “squad” dataset that’s loaded for the Question Answering example so if i missed something - my apologies.
The question is simple - the current shape of the “squad” dataset looks like this -
DatasetDict({
train: Dataset({
features: [‘version’, ‘data’],
num_rows: 1
})
validation: Dataset({
features: [‘version’, ‘data’],
num_rows: 1
})
})
Whereas your old colab notebooks show them in a different structure … such as this -

DatasetDict({
train: Dataset({
features: [‘id’, ‘title’, ‘context’, ‘question’, ‘answers’],
num_rows: 87599
})
validation: Dataset({
features: [‘id’, ‘title’, ‘context’, ‘question’, ‘answers’],
num_rows: 10570
})
})

So i tried to play safe by downloading the v1.1 dataset and went up until the pre_process function successfully by doing this - on my data.
train_contexts, train_questions, train_answers = read_squad(‘squad/train.json’)
val_contexts, val_questions, val_answers = read_squad(‘squad/dev.json’)

Now my skills run out when we get to the preprocesstrainingexamples function … am unable to pass the right object when we need to call the .map function to iterate through my dataset. Any help here towards either redirecting me to “forcing” v1.1 data in some ways thro this process or how i could potentially roll my current data structure into a form that can call this .map function - would be amazing. Kindly advise.

thanks.

Topic		Replies	Views
Problem with Hugging face customised SQuad dataset Beginners	4	27	January 21, 2025
Question answering Beginners	0	290	November 1, 2021
Question answering bot: fine-tuning with custom dataset Beginners	6	6022	June 23, 2022
How to understand the answer_start parameter of Squad dataset for training BERT-QA model + practical implications for creating custom dataset? Intermediate	1	1002	September 1, 2023
Loading nested dataset for training 🤗Datasets	5	46	February 5, 2025

Shape of squad data for Question answering

Related topics