Efficiently slicing imagefolder dataset split

mariosasko · July 21, 2022, 1:53pm

Hi! You can access the first 100 examples in the streaming mode as follows:

import datasets
data_files = { 'train': ['../ImageNet/train/**'] }
dset = datasets.load_dataset('imagefolder', split='train', data_files=data_files, task="image-classification")
dset = dset.take(100)
for ex in dset:
    ...

The train[:100] syntax is currently not supported in the streaming mode, but we plan to add it at some point (see Enable splits during streaming the dataset · Issue #2962 · huggingface/datasets · GitHub). And in the non-streaming mode, we always download all the data instead of downloading only the data needed to build the requested split. This is a well-known limitation of datasets, and we plan to address it soon.

Topic		Replies	Views
Extremely slow data loading of imagefolder 🤗Datasets	9	2484	January 4, 2024
Download only a subset of a split 🤗Datasets	10	17129	February 25, 2025
How to slice an already loaded Dataset? 🤗Datasets	2	5869	December 16, 2022
Loading a fraction of data 🤗Datasets	5	5387	May 12, 2023
Load a subset of a dataset 🤗Datasets	2	1875	April 19, 2023

Efficiently slicing imagefolder dataset split

Related topics