Could someone explain these operations on interleave_datasets
? I didn’t find any word in the doc on this. Thanks.
1 Like
Hi ! You can find the documentation of interleave_datasets
here:
Interleaving datasets can be seen as a way to alternate between examples of several datasets. For example
from datasets import Dataset, interleave_datasets
d1, d2 = {"foo": ["a", "b", "c"]}, {"foo": ["x", "y", "z"]}
d1, d2 = Dataset.from_dict(d1), Dataset.from_dict(d2)
d3 = interleave_datasets([d1, d2])
print(d3["foo"])
# ['a', 'x', 'b', 'y', 'c', 'z']
You can call shuffle or select on the resulting dataset if you want.
Shuffle will shuffle the order of the dataset:
print(d3.shuffle(seed=42)["foo"])
['y', 'b', 'z', 'c', 'x', 'a']
Select will select the indices you want:
print(d3.select([0, 1, 2])["foo"])
# ['a', 'x', 'b']
1 Like