For research purposes, I would like to control the content of each batch i.e. choose myself which samples are forming each batch during training. Is it possible to do this using datasets/transformers ?
Hi @rbagat ,
What do you mean by choosing samples yourself exactly? As in: passing a predefined set of indices to the model?
If so, you can set shuffle==False and somehow pass the predefined indices to the batching. If you give more details I might be able to assist you.
Best,
M
Hello @mikehemberger,
Thank you for your answer. I am working with accented speech, and so I would like to make sure I only have one kind of accent in a batch. I, of course, have access to accent labels. Setting shuffle to False would work but I would like to keep shuffling the data so the model, during training, doesn’t see the exact same batches every epoch.
I guess then select, filter and/or dataset interleaving should work for you? You‘ll find documentation here:
If questions remain, a code snippet would help.
Best,
M