Hi,
I have several datasets, and want a dataloader that can sample from multiple datasets, so iterating over the dataloader yields batch_size
number of items from each dataset.
Is that possible?
Hi,
I have several datasets, and want a dataloader that can sample from multiple datasets, so iterating over the dataloader yields batch_size
number of items from each dataset.
Is that possible?
Hi! You can use interleave_datasets for that and pass the returned dataset to the dataloader. Another option is to create one dataloader for each dataset and sample from them.
Aah, I think interleave_datasets
will yield batch_size
items overall, from a mixture of datasets, whereas I want batch_size
items from each dataset. Is that possible?
interleave_datasets
cycles through the given list of datasets, which means you can set the dataloader’s batch size to batch_size
* the number of interleaved datasets to get batch_size
samples from each dataset in each iteration. Another option is to have a separate dataloader for each dataset.