Training on multiple datasets


I’d like to use trainer API to train a distilbert model on multiple subsets of a dataset sequentially. Specifically, I have my dataset split into an ‘easy’ portion and a ‘hard’ portion. The idea is to train on the easy dataset first for N epochs, and then on the hard dataset for N epochs.

Most answers I found on forums suggest the interleave or concat functions from Datasets. This isn’t helpful to me, as for my research project I’m looking at the order of training. I’ve also tried to just feed the model from one trainer as the pretrained model for another trainer. This also doesn’t seem to work very well. Is there any good way to do this?

Thanks in advance!