How to load dataset locally?

I want to load dataset locally. (such as xcopa).

for xcopa, i manually download the datasets from this Link, and set the mode to offline mode. The code is:

import os
os.environ['HF_DATASETS_OFFLINE'] ='1'
from datasets import load_dataset
xcopa = load_dataset('./datasets/datasets/xcopa/xcopa.py', 
                     name='et',
                     date_dir='path/to/zip', 
                     cache_dir='path/to/cache')

But it still want to download the zip files from the Link rather than the locally files.
So i compare the difference between the xcopa.py & matinf.py (which should manually download). They do have some differences, so Can xcopa dataset be load locally (manually download the zip files and load it)? (Maybe in the _split_generators method of the xcopa.py)

Hi,

I’ll assume you are trying to download the dataset that way due to an error thrown by load_dataset("xcopa", "et"). It’s going to be fixed soon!

As you correctly pointed out, there are some differences in the data that are causing the error. In the meantime, you can bypass the error and download the dataset (with the newest data) as follows:

from datasets import load_dataset
dset = load_dataset("xcopa", "et", ignore_verifications=True)

Thanks for your reply! But it seems cannot slove the problems. I want to manually download the original xcopa zip files to the local storage, And load this original files locally.

To some reasons, i cannot access the internet connection. So i use the above path='./datasets/datasets/xcopa/xcopa.py' .
But in the xcopa.py, it seems must to download the zip files remotely.(Which might be different from the matinf.py? It provide the dl_manager.manual_dir in the Line 134. ).

Hi,

oh, I see. The _split_generators method is the right place to use dl_manager.manual_dir. You can find an example script that uses it here. Additionally, you can use dl_manager.extract(manual_dir) to extract the data, so you don’t have to do it manually or with the zipfile module. And to avoid the download, comment out the line with the dl_manager.download_and_extract(_URL) call.

ok. Thanks a lot !
That is to say, under the datasets/datasets/xcopa/xcopa.py in the datasets repository. I must to download the xcopa zip files remotely and get the cached files. Then i can load the cached datasets locally.

If i want to manully download the xcopa zip files and load it, i must to write another ‘xcopa.py’ (like you provided) to load the zip files locally. right? :eyes: