How to download files stored in repo of dataset script?

patrickvonplaten · March 7, 2022, 12:41pm

Let’s say I want to upload a private datasets to the Hugging Face Hub, under patrickvonplaten/dataset_new/. The dataset contains both files and a dataset_new.py loading script.
So I want to load the dataset with

load_dataset("patrickvonplaten/dataset_new", use_auth_token=True)

The dataset repo will contain some metadata files which are important to split the actual data of the dataset which is downloaded from an external link, let’s call it patrickvonplaten/dataset_new/splits.txt

This means in the dataset script dataset_new.py, first I load & extract the data from the external link:

archive_path = dl_manager.download_and_extract("<external_link>")

Now I also need the splits.txt file - how do I load it?
I can’t just do:

splits_path = dl_manager.download("https://huggingface.co/datasets/patrickvonplaten/dataset_new/raw/main/splits.txt")

since it’s a private dataset and also it’s probably not the cleanest way of loading data.

Any ideas on what should be used here? @lhoestq @mariosasko @albertvillanova

patrickvonplaten · March 7, 2022, 12:55pm

Replying on the behalf of @lhoestq ,

It’s actually as simple as just writing:

splits_path = dl_manager.download("splits.txt")

This will automatically redirect to the repo’s folder and download the splits file.

Topic		Replies	Views
Using load_datasets for newly created datasets 🤗Datasets	2	455	August 27, 2021
Downloading a dataset files locally Beginners	3	36855	November 4, 2024
How to load local dataset 🤗Datasets	1	1371	May 2, 2023
Dataset Viewer for dataset with downloadable data 🤗Datasets	3	23	March 6, 2025
Huggingface-cli to load_dataset 🤗Datasets	4	3766	March 6, 2024

How to download files stored in repo of dataset script?

Related topics