Hello, all!
My computer doesn’t have internet connection. So I have to first download dataset on another computer and copy the dataset to my offline computer.
I use the following code snippet to download wikitext-2-raw-v1 dataset.
from datasets import load_dataset
datasets = load_dataset("wikitext", "wikitext-2-raw-v1")
And I found that some cached files are in the ~/.cache/huggingface/
's sub dirs.
In the ~/.cache/huggingface/modules/datasets_modules/datasets/wikitext/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126
dir I can see:
__init__.py
, __pycache__
, dataset_infos.json
, wikitext.json
, wikitext.py
In the ~/.cache/huggingface/datasets/wikitext/wikitext-2-raw-v1/1.0.0/a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a1 26
dir I can see:
LICENSE dataset_info.json wikitext-test.arrow wikitext-train.arrow wikitext-validation.arrow
Do I have to copy all those files to the offline computer? Can I change a241db52902eaf2c6aa732210bead40c090019a499ceb13bcbfa3f8ab646a126
to other names?
Or how to change those arrow
files to csv
files?