Load a subset of a dataset

jimothyhalpert7 · April 17, 2023, 8:02pm

I want to speed up the

Generating train split:

step of loading the cornell_movie_dialog dataset.
I figured if I should be able to load a subset of the dataset, and the generation should go faster.

Is it possible to do that?

surya-narayanan · April 18, 2023, 9:12pm

You should be able to index into a dataset, so try using cornell_movie_dialog[:1000] to see if that works

jimothyhalpert7 · April 19, 2023, 4:27pm

It doesn’t. The

Generating train split:

would still take all 80-ish thousand entries.

Topic		Replies	Views
Slow in generating train split when loading local dataset 🤗Datasets	1	1588	January 12, 2024
Loading a fraction of data 🤗Datasets	5	5259	May 12, 2023
Download only a subset of a split 🤗Datasets	10	16625	February 25, 2025
Efficiently slicing imagefolder dataset split 🤗Datasets	9	1430	December 16, 2022
Fetching rows of a large Dataset by index 🤗Datasets	10	1630	March 15, 2021

Load a subset of a dataset

Related topics