Desired behavior when calling `shuffle` or `select` on `interleave_datasets`

Could someone explain these operations on interleave_datasets? I didn’t find any word in the doc on this. Thanks.

1 Like

Hi ! You can find the documentation of interleave_datasets here:

Interleaving datasets can be seen as a way to alternate between examples of several datasets. For example

from datasets import Dataset, interleave_datasets

d1, d2 = {"foo": ["a", "b", "c"]}, {"foo": ["x", "y", "z"]}
d1, d2 = Dataset.from_dict(d1), Dataset.from_dict(d2)
d3 = interleave_datasets([d1, d2])
print(d3["foo"])
# ['a', 'x', 'b', 'y', 'c', 'z']

You can call shuffle or select on the resulting dataset if you want.
Shuffle will shuffle the order of the dataset:

print(d3.shuffle(seed=42)["foo"])
['y', 'b', 'z', 'c', 'x', 'a']

Select will select the indices you want:

print(d3.select([0, 1, 2])["foo"])
# ['a', 'x', 'b']
1 Like