`load_dataset`: how to extract only the validation split?

jmsdao · March 13, 2023, 2:18am

When I run something like dataset = load_dataset('trivia_qa', 'rc', split='validation') , I see a tqdm that is extracting the train split. Is there a way to skip this?

mariosasko · March 15, 2023, 3:31pm

Hi! No, but we plan to address this soon - this will likely require introducing a new script structure, which is why we haven’t implemented it yet.

In the meantime, you can download the [trivia_qa] (trivia_qa.py · trivia_qa at main) script from the Hub, replace

    return [
            datasets.SplitGenerator(
                name=name,
                gen_kwargs={
                    "files": _qa_files(file_paths, cfg.sources, name, cfg.unfiltered),
                    "web_dir": web_evidence_dir,
                    "wiki_dir": wiki_evidence_dir,
                },
            )
            for name in [datasets.Split.TRAIN, datasets.Split.VALIDATION, datasets.Split.TEST]
        ]

with

    return [
            datasets.SplitGenerator(
                name=name,
                gen_kwargs={
                    "files": _qa_files(file_paths, cfg.sources, name, cfg.unfiltered),
                    "web_dir": web_evidence_dir,
                    "wiki_dir": wiki_evidence_dir,
                },
            )
            for name in [datasets.Split.VALIDATION]
        ]

and then run load_dataset("path/to/script").

jmsdao · March 15, 2023, 9:50pm

Amazing, thanks!

Topic		Replies	Views
How can I download a specific split of a dataset? 🤗Datasets	1	1193	April 3, 2024
Confusion in splitting dataset (from imagefolder) into train, test and validation 🤗Datasets	2	5729	August 12, 2022
Not declaring splits inside of dataset loading script 🤗Datasets	2	1596	July 28, 2022
Split DataFrame into validation and train split 🤗Datasets	2	6497	April 11, 2022
How to use load_dataset to load a json file with all three splits? 🤗Datasets	2	9590	April 13, 2023

`load_dataset`: how to extract only the validation split?

Related topics