Save and load datasets

My office PC is not connected to internet, and I want to use the datasets package to load the dataset. I first saved the already existing dataset using the following code:

from datasets import load_dataset
datasets = load_dataset("glue", "mrpc")
datasets.save_to_disk('glue-mrpc')

A folder is created with dataset_dict.json file and three folders for train, test, and validation respectively. I zipped the folder andcopied to my office PC, and now when i tried loading the dataset from folder using the following code:

datasets = load_dataset('/content/glue-mrpc')

But I am getting the following error:

FileNotFoundError: Couldn't find file locally at /content/glue-mrpc/glue-mrpc.py. Please provide a valid dataset name.

I also tried pickling, but didnt work.

Is there any way to save the dataset in one PC and load it in another PC?
Thank you.

1 Like

When loading from a directory created by save_to_disk function, try using load_from_disk method

4 Likes

Thank you so much, it worked.

I should have checked for dir(datasets)

1 Like