Json dump format for load_dataset

Hi ! You can simply use .to_json() - see documentation here

Here is an example using SQuAD:

from datasets import load_dataset                   

squad = load_dataset("squad", split="train")        
squad.to_json("squad.json")            

data_files = {"train": "squad.json"}
re_squad = load_dataset("json", data_files=data_files, split="train")

This creates a JSON Lines file, then it reloads it using the JSON dataset loader :slight_smile:

4 Likes