How to continue training on another dataset?

Hi, I want to do some language model pre-training, using the Trainer API.

Assume I have two datasets wikitext and bookcorpus. I want to first train on wikitext and then on bookcorpus, and I want to save the checkpoint after training on wikitext, then continue training on bookcorpus and save the later checkpoints.

I wish to have the checkpoints something like this:

checkpoint-500 (only wikitext)
checkpoint-1000 (only wikitext)
checkpoint-1500 (only wikitext)
checkpoint-1800 (finished training on wikitext)
checkpoint-2300 (continue training on bookcorpus)

I don’t want to mix the two datasets together, because I want to analyse what’s the difference after training on another dataset. I want to know how to achieve this?

Could anyone help me?

@sgugger Could you please have a look?