Load a subset of a dataset

I want to speed up the

Generating train split:

step of loading the cornell_movie_dialog dataset.
I figured if I should be able to load a subset of the dataset, and the generation should go faster.

Is it possible to do that?

You should be able to index into a dataset, so try using cornell_movie_dialog[:1000] to see if that works

It doesn’t. The

Generating train split:

would still take all 80-ish thousand entries.