Could I download the dataset manually?

Due to the connection error I cannot download some datasets from original URL, such as librispeech. But I can download it manually and store it. So how can I make the datasets package recognize it? I mean, where should I put the dataset, like ‘~/.cache/data/librispeech’ or somewhere else? Or how can I change the original code of datasets and it can know the dataset location.

Thanks!

Hi! Could you please copy & paste the connection error you get? To work with the local data, you’ll have to download the librispeech script from our repo and modify it in the way it reads the data from the downloaded directory - you can pass the path to the data directory as follows:

from datasets import load_dataset
dset = load_dataset("path/to/dir/of/your/modifiedlibrispeech/script", data_dir="path/to/librispeech/data")

and access the data_dir value in the modified librispeech script as follows:

def _split_generators(self, dl_manager):
    local_data_path = dl_manager.manual_dir
    ...