Saving train/val/test datasets

Hi !

You can save them all as a dataset dictionary:

from datasets import DatasetDict, load_from_disk

dataset = DatasetDict({
    "train": train_dataset,
    "validation": validation_dataset,
    "test": test_dataset,
})

dataset.save_to_disk("path/to/dataset/dir")

# reload
dataset = load_from_disk("path/to/dataset/dir")

# access any split
train_dataset = dataset["train"]

This is especially useful to save several splits of a dataset together.

2 Likes