I used to make composed datasets using the āinterleave_datasets.ā I know it is very useful and powerful for getting a large dataset with huggingface hub. But I have a question: Is it possible to āinterleaveā a group of datasets with the most extended length of datasets, not the shortest?
Here is a example:
from datasets import interleave_datasets, Dataset
d1 = Dataset.from_dict({"a": [1,2,3,4] })
d2 = Dataset.from_dict({"a": [100,200] })
interleaved = interleave_datasets([d1,d2])
print(len(interleaved))
>>> 4
# it is because the dataset constructed as {'a': 1}, {'a': 100}, {'a': 2}, {'a': 200}.
But I want to make a dataset like this,
{'a': 1}, {'a': 100}, {'a': 2}, {'a': 200}, {'a': 3}, {'a': 100}, {'a': 4}, {'a': 200}
## the length of the dataset is 8 (with cycling the short one).
How can I make this with huggingface library?