Load Dataset Fail for Custom Json Format

nanami · January 24, 2023, 8:43pm

I have a json file that has the following format (used :
[ { “A” : string, “B”: list of string, “C”: list of list of bool }, sample2, sample3, …]

When I used load_dataset(“json”, data_files={“train”:data_path + “Data/train.json”), I got the following error:

datasets.builder.DatasetGenerationError: An error occurred while generating the dataset
pyarrow.lib.ArrowTypeError: Expected bytes, got a ‘list’ object
pyarrow.lib.ArrowInvalid: JSON parse error: Column() changed from object to array in row 0

What’s wrong with my procedure? The only thing I can imagine is that load_dataset() doesn’t support list of list. But I didn’t find this in the documentation, and the error message is also not explicit enough to see if that’s the reason.

Thanks in advance for any input

mariosasko · February 6, 2023, 2:11pm

Hi! What does "used : " mean? Can you please specify the structure inside a code block?

JSON arrays/lists are supported, but this is (still) not documented.

nanami · February 19, 2023, 2:57pm

Thanks for your response!
The “used” was a typo, the structure is

[ { “A” : string, “B”: list of string, “C”: list of list of bool }, sample2, sample3, …]

So essentially the json is a list of dictionaries. For each dictionary the keys are strings and there are three key-value pairs; a string, a list of strings, and a nested list of bool.

merlinyx · June 20, 2023, 7:58pm

Hi, this is rather late but I also ran into this issue and realized that the JSON format should be

{ “A” : string, “B”: list of string, “C”: list of list of bool }
{ “A” : string, “B”: list of string, “C”: list of list of bool }
...
{ “A” : string, “B”: list of string, “C”: list of list of bool }

basically each dictionary’s JSON str is delimited with a newline character, instead of having all the dictionaries in a list when being dumped to a file.

Topic		Replies	Views
Error with load model from JSON in datasets 🤗Datasets	2	673	November 25, 2023
JSON parse error when load_dataset 🤗Datasets	0	96	August 10, 2024
Load_dataset() keep throwing `ArrowInvalid: JSON parse error` Beginners	0	666	August 12, 2024
Problem with loading custom dataset from jsonl file Beginners	1	12719	May 5, 2023
ArrowTypeError in load_dataset 🤗Datasets	1	626	June 12, 2023

Load Dataset Fail for Custom Json Format

Related topics