Mapama247
It does not work.
First of all, it could not directly download the dataset . Second, even the above code does not work.
For instance, after downloading xsum.py, I use the following code and try to download the XSUM dataset.
from datasets import load_dataset
raw_datasets = load_dataset("./xsum.py", "raw_datasets", split="train)
It shows the error as follows.
FileNotFoundError: Local file data/XSUM-EMNLP18-Summary-Data-Original.tar.gz doesn’t exist
I find there is one line code in the xsum.py
# From https://github.com/EdinburghNLP/XSum/issues/12
_URL_DATA = "data/XSUM-EMNLP18-Summary-Data-Original.tar.gz"
Option 1:
It can download the dataset but ReadError while “Generating train split” if f I use the following code to replace the above “_URL_DATA…”
_URL_DATA = “http://bollin.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz”
Option 2
After adding ssl code, it works if using the original code as follows.
import ssl
try:
_create_unverified_https_context = ssl._create_unverified_context
except AttributeError:
pass
else:
ssl._create_default_https_context = _create_unverified_https_context
# The xsum dataset is stored in .cache/huggingface/datasets/xsum
from datasets import load_dataset
raw_datasets = load_dataset("xsum", split="train")
Anyway, the method is not a direct method. It could not save the code locally.
The direct downloading method is listed as follows.
$ wget http://bollin.inf.ed.ac.uk/public/direct/XSUM-EMNLP18-Summary-Data-Original.tar.gz --no-check-certificate
However, it is not easy to get such a downloading weblink for every dataset in HuggingFace.