Let’s say I want to upload a private datasets to the Hugging Face Hub, under
patrickvonplaten/dataset_new/. The dataset contains both files and a
dataset_new.py loading script.
So I want to load the dataset with
The dataset repo will contain some metadata files which are important to split the actual data of the dataset which is downloaded from an external link, let’s call it
This means in the dataset script
dataset_new.py, first I load & extract the data from the external link:
archive_path = dl_manager.download_and_extract("<external_link>")
Now I also need the
splits.txt file - how do I load it?
I can’t just do:
splits_path = dl_manager.download("https://huggingface.co/datasets/patrickvonplaten/dataset_new/raw/main/splits.txt")
since it’s a private dataset and also it’s probably not the cleanest way of loading data.