Hi,
I have several datasets, and want a dataloader that can sample from multiple datasets, so iterating over the dataloader yields batch_size
number of items from each dataset.
Is that possible?
Hi,
I have several datasets, and want a dataloader that can sample from multiple datasets, so iterating over the dataloader yields batch_size
number of items from each dataset.
Is that possible?
Hi! You can use interleave_datasets for that and pass the returned dataset to the dataloader. Another option is to create one dataloader for each dataset and sample from them.
Aah, I think interleave_datasets
will yield batch_size
items overall, from a mixture of datasets, whereas I want batch_size
items from each dataset. Is that possible?
interleave_datasets
cycles through the given list of datasets, which means you can set the dataloader’s batch size to batch_size
* the number of interleaved datasets to get batch_size
samples from each dataset in each iteration. Another option is to have a separate dataloader for each dataset.
If we use a separate dataloader for each dataset, how the training loop will look like?
for each epoch, how can we the batch from each one of the dataloaders and calculate the loss?
Especially for the case where the length of dataloaders is not the same?