Use Trainer API with two valiation sets

Hi everyone,
right now the Trainer API accepts one eval_dataset. I am wondering, is it somehow possible to provide two different validation sets that are both evaluated during training? For example, I might want to track my validation loss on a validation set that was previously sampled from my training data and hence shares the same distribution and on a validation set that was sampled from a data set with a presumably different data distribution (e.g., stemming from a different period). The idea stems from “Don’t Stop Pretraining: Adapt Language Models to Domains and Tasks”.

Thanks in advance :slight_smile:

@patrickvonplaten this a common use case in research (including in mine). I don’t see any examples in the docs, but the docs (Callbacks) suggest that there should be a way to write a callback function to achieve this? Looking at the code very briefly, I imagine that a callback would simply call the evaluate again for all other eval_datasets at the end of the validation loop, optionally changing metric_key_prefix so that the logger displays traces of the metrics measured on individual datasets separately?

I’d have to look in more detail at the APIs for implementing a callback to provide more details, but I’m thinking.

UPDATE: I had a look but I can’t see how I could pass the other datasets to the callback, so I assume this is not possible. To promote a productive discussion, I raised a feature request where I summarise the issue and a potential workaround here.