I had the exact same problem. HF’s datasets.interleave_datasets()
can deal with a list of iterabledataset
s but the returned iterabledataset
will have n_shards
being the smallest of the list. In your case it is 1. So the workaround to achieve the goal is to pre-process so that all iterabledataset
s in the list have a n_shards
of n
before passing over to interleave_datasets()
.