Implementation of Two Distinct Datasets with HuggingFace Trainer Module

You could try this I think…

Override the Trainer.get_train_dataloader() method to return a custom iterator that uses two different dataloader, one for each dataset. Iterate over each set per epoch and you can handle the batch size difference… This would give you a lot of the flexibility that you can control and implement using the given library code. Then you would just have to extend out the trainer class with a multi dataset trainer class

1 Like