Using load_datasets for newly created datasets

Sam2021 · August 26, 2021, 8:19pm

Hi all

I am created a new public datasets here (Sam2021/Arguement_Mining_CL2017 · Datasets at Hugging Face) and I have added loading script with link to actual data in github.

But I am not able to load the data remotely using
data = load_datasets('Sam2021/Arguement_Mining_CL2017)

Can anyone please let me know what I am doing wrong ?

lhoestq · August 27, 2021, 10:10am

Hi ! If your data is in the dataset repository, then you can just pass the name of the data files to be downloaded:

_TRAINING_FILE = "train.txt"
_TEST_FILE = "test.txt"

    def _split_generators(self, dl_manager):
        """Returns SplitGenerators."""
        urls_to_download = {"train": _TRAINING_FILE, "test": _TEST_FILE}
        downloaded_files = dl_manager.download_and_extract(urls_to_download)
        return [
            datasets.SplitGenerator(name=datasets.Split.TRAIN, gen_kwargs={"filepath": downloaded_files["train"]}),
            datasets.SplitGenerator(name=datasets.Split.TEST, gen_kwargs={"filepath": downloaded_files["test"]}),
        ]

Under the hood it will download it from an URL like this one:
https://huggingface.co/datasets/Sam2021/Arguement_Mining_CL2017/resolve/main/train.txt

Sam2021 · August 27, 2021, 10:41am

Thanks ! I have resolved it now, it was a stupid mistake on my part

Topic		Replies	Views
How to download files stored in repo of dataset script? 🤗Datasets	1	896	March 7, 2022
How to load local dataset 🤗Datasets	1	1388	May 2, 2023
Downloading a dataset files locally Beginners	3	37143	November 4, 2024
Dataset loading script not working 🤗Datasets	2	431	August 31, 2023
Huggingface-cli to load_dataset 🤗Datasets	5	3899	July 23, 2025

Using load_datasets for newly created datasets

Related topics