How to slice an already loaded Dataset?

Using Datasets version ‘2.6.1’, how do I slice a take a Dataset object and slice it (for example only the first 100 examples) and get a Dataset?

dd = load_dataset is a DatasetDict.
dd['train'] is a Dataset.

If I dd['train'][:100], I get a dict not a Dataset object anymore. Also I can’t create a new Dataset with datasets.Dataset(dd['train'][:100]).

I have seen some examples in the forum of a method called take.

But Dataset.take does not seem to exist anymore.

I guess this is for an old version of Dataset, how to slice an already loaded Dataset?

Got it. The method now is select

1 Like

Yes, select is the correct answer.

Btw, Dataset.take was never a part of the API, but we may add it eventually to be consistent with the IterableDataset, which is returned in the streaming mode.

2 Likes