How to load cached dataset offline?

plkmn · May 25, 2022, 4:38am

Hello, all!

My computer doesn’t have internet connection. So I have to first download dataset on another computer and copy the dataset to my offline computer.

I use the following code snippet to download wikitext-2-raw-v1 dataset.

from datasets import load_dataset
datasets = load_dataset("wikitext", "wikitext-2-raw-v1")

And I found that some cached files are in the ~/.cache/huggingface/ 's sub dirs.

In the ~/.cache/huggingface/modules/datasets_modules/datasets/wikitext/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126 dir I can see:
__init__.py, __pycache__, dataset_infos.json, wikitext.json, wikitext.py

In the ~/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a1 26 dir I can see:
LICENSE dataset_info.json wikitext-test.arrow wikitext-train.arrow wikitext-validation.arrow

Do I have to copy all those files to the offline computer? Can I change a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126 to other names?

Or how to change those arrow files to csv files?

mariosasko · May 27, 2022, 12:02pm

Hi! You only need the arrow files, but instead of looking for them in cache, it’s more convenient to save the dataset to disk with save_to_disk and transfer the generated folder to another computer, where you can simply load the dataset with load_from_disk("path/to/folder").

Or how to change those arrow files to csv files?

You can use Dataset’s to_csv method for that.

plkmn · May 29, 2022, 1:07pm

Thanks for helping me!

Topic		Replies	Views
How to load dataset that exist in cache path Beginners	5	4955	December 6, 2023
Can't use datasets offline, even if I have uploaded the datasets to .cache dir 🤗Datasets	10	7934	December 1, 2022
Load dataset from cache in offline mode 🤗Datasets	1	1688	January 23, 2023
Load dataset from a specific cache file 🤗Datasets	3	1232	February 26, 2024
Loading dataset from cache .arrow file 🤗Datasets	1	745	March 28, 2023

How to load cached dataset offline?

Related topics