Hi! You can access the first 100 examples in the streaming mode as follows:
import datasets
data_files = { 'train': ['../ImageNet/train/**'] }
dset = datasets.load_dataset('imagefolder', split='train', data_files=data_files, task="image-classification")
dset = dset.take(100)
for ex in dset:
...
The train[:100]
syntax is currently not supported in the streaming mode, but we plan to add it at some point (see Enable splits during streaming the dataset 路 Issue #2962 路 huggingface/datasets 路 GitHub). And in the non-streaming mode, we always download all the data instead of downloading only the data needed to build the requested split. This is a well-known limitation of datasets
, and we plan to address it soon.