Downloading a dataset files locally

mapama247 · December 25, 2022, 10:12am

You can use the wget command followed by the file’s URL, which should have the following format: <HUB_REPO_URL>/resolve/main/<FILE_NAME>. If you are unsure about the exact URL, you can just go to the “Files and versions” section and right-click the little arrow next to the file size to select the “Copy link address” option.

For instance, this would be a way to download the MRPC corpus that you mention:

wget https://huggingface.co/datasets/glue/resolve/main/dataset_infos.json
wget https://huggingface.co/datasets/glue/resolve/main/glue.py

And then you can enter python and do:

from datasets import load_dataset
mrpc = load_dataset(“./glue.py”, “mrpc”)

Topic		Replies	Views
How to load local dataset 🤗Datasets	1	1363	May 2, 2023
How to use local version of super_glue dataset instead of downloading it? 🤗Datasets	1	789	October 31, 2022
Save and load datasets 🤗Datasets	2	38697	August 16, 2021
How to download files stored in repo of dataset script? 🤗Datasets	1	894	March 7, 2022
Loading downloaded dataset from local directory 🤗Datasets	0	237	April 20, 2024

Downloading a dataset files locally

Related topics