How can I download a specific split of a dataset?

I’m trying to download just the test and validation splits of the following dataset: yuvalkirstain/pickapic_v2 at main.

I can tell that the files are named according to the bottommost scheme here: File names and splits. But, when I use load_dataset with split="validation", it still downloads the entire dataset. Why?

Is load_dataset not smart enough to only download files with which have validation in the name?

Yep correct, for now if you really don’t want to download everything your have to either pass the data_files= to download or use streaming

Contributions to improve this are welcome though ! GitHub - huggingface/datasets: 🤗 The largest hub of ready-to-use datasets for ML models with fast, easy-to-use and efficient data manipulation tools