KeyError: 'data'

Hi,

I am doing BERT training with a drop dataset; I have downloaded the dataset and modified passages and questions; when I used the replaced dataset to run the script, it could get some results but also returned an error: “input_data = json.load(reader)[“data”]”.
But I don’t know how to fix the problem, could anyone give me some tips? That would be much appreciated! : )

I have found the problem. But not sure how to fix it.
This is the original squad dataset.
squad
This is the drop dataset.


And the code in the script.

So, can I delete the [“data”] here to fix the key error?

There may be an issue with the way your data is formatted. Could you please share the structure of your data?

Thank you for replying. I have found the problem. but not sure how to fix it.
This is the drop dataset.


This is the squad dataset
squad
And the code in the script

So, can I delete the [“data”] here to fix the key error?

  • It appears that you may be utilizing the run_squad.py file from Huggingface’s repository. If you intend to use it as is, you must adhere to the format given by Huggingface, which can be found at the following link: squad_format.
  • The key and value must be accurate, and it seems that several keys in your drop dataset, such as passage and qa_pairs , are incorrect. Additionally, I do not see context or answer_start in your drop dataset. Before anything else, you should review the format and create your drop_dataset accordingly.