Using load_datasets for newly created datasets

Hi all

I am created a new public datasets here (Sam2021/Arguement_Mining_CL2017 · Datasets at Hugging Face) and I have added loading script with link to actual data in github.

But I am not able to load the data remotely using
data = load_datasets('Sam2021/Arguement_Mining_CL2017)

Can anyone please let me know what I am doing wrong ?

Hi ! If your data is in the dataset repository, then you can just pass the name of the data files to be downloaded:

_TRAINING_FILE = "train.txt"
_TEST_FILE = "test.txt"
    def _split_generators(self, dl_manager):
        """Returns SplitGenerators."""
        urls_to_download = {"train": _TRAINING_FILE, "test": _TEST_FILE}
        downloaded_files = dl_manager.download_and_extract(urls_to_download)
        return [
            datasets.SplitGenerator(name=datasets.Split.TRAIN, gen_kwargs={"filepath": downloaded_files["train"]}),
            datasets.SplitGenerator(name=datasets.Split.TEST, gen_kwargs={"filepath": downloaded_files["test"]}),
        ]

Under the hood it will download it from an URL like this one:
https://huggingface.co/datasets/Sam2021/Arguement_Mining_CL2017/resolve/main/train.txt

1 Like

Thanks ! I have resolved it now, it was a stupid mistake on my part