How to use load_dataset to load a json file with all three splits?

I have a file in which all train/validation/test splits and corresponding data are included.
If I load the json file without the field argument, the error information shows that

This JSON file contain the following fields: ['train', 'validation', 'test']. Select the correct one and provide it as `field='XXX'` to the dataset loading method.

But I can only use

load_dataset("json", data_files="xx.json", field="train")

to load the specific split.
When I tried to use

load_dataset("json", data_files="xx.json", field=["train", "validation", "test"])

it seems not work.
I think it’s not necessary to split data into files. Is there a better way to meet my requirements, or should I open an issue to ask for the support for a “list” field?

1 Like

You can load each split separately:

ds_train = load_dataset("json", data_files="xx.json", field="train")["train"]
ds_test = load_dataset("json", data_files="xx.json", field="test")["train"]
ds_valid = load_dataset("json", data_files="xx.json", field="validation")["train"]

(you need to add ["train"] at the end because splitting is not supported right now, so everything ends up in the “train” split)

I agree it can be nice to be able to provide a mapping split_name<->field_name in the field argument, feel free to open an issue !

You can save the data in three json files, and use:

load_dataset("json", data_files={'train':'xx/train.json', 'validation':'xx/valid.json', 'test':'xx/test.json'})
1 Like