Evaluating your model on more than one dataset


Transformer’s Trainer and Trainingarguments classes allow for only one dataset to use for evaluation. Is there a simple way of adding another one? So, after after an epoch of training my model I could evaluate it on both training and developmental datasets and get metrics for both of them as one output? I know I could alter the training_args.py or trainer.py but I am pretty sure I would only mess things up…


I think the easiest way to do this is to use the new system of TrainerCallback and write a callback that performs a new evaluation on your other datasets during the event on_validate.

Is it possible to provide an example?

@sgugger, I had a brief look at the interfaces provided to achieve this, but I don’t see how this is possible. As I understand, the TrainerCallback class receives the TrainerArguments which are used to initialise the Trainer but this class does not allow us to pass additional eval_dataloaders. Therefore, I am not sure how to pass the additional datasets to the Trainer in the first place.