State.json does not reflect to the split of the dataset

My raw_dataset includes a few long documents:

DatasetDict({
    train: Dataset({
        features: ['text'],
        num_rows: 174
    })
    test: Dataset({
        features: ['text'],
        num_rows: 31
    })
})

I managed to save it with the raw_dataset.save_to_disk(dataset_path). When I checked the folder structure it looked as it should. train and test subfolders were created. Each of them included 3 files: an arrow data file a data-info.json and a state.json.

I do not understand why the two state.json has the same _split value when the one is for training and the other one is for validation. It is well recognized in the dataset_dict.json file as it includes: {"splits": ["train", "test"]}

So My question is what is the role of the split property of the saved dataset if sate.json can not differentiate them?

1 Like