Evaluate subset of data during training

Hi all,

I’m using the run_mlm.py script.

My evaluation set is “too large” (i.e., takes too long to run through the entire evaluation set, every n steps of training); so I was hoping to sub-sample from the evaluation set during training.

Is there a simple way to extend or use the Trainer with custom logic for sub-sampling examples from the provided eval_dataset? In the worst case, I can manually specify a subset of the eval set to be fed into the Trainer, but I was hoping to do a random subsample for each in-training evaluation so that I don’t overfit to one sub-sample of the evaluation set.


Why do you not give a smaller evaluation dataset? You can then run trainer.evaluate(full_eval_dataset) to evaluate on the full validation dataset.


As Sahil said, he does not want to evaluate always on the same (small) set of data to avoid overfitting on the specific sub-sample. I agree, and I am facing the same issue.

As explained here, to inject custom behaviour you can subclass them and override the following methods:

  • get_train_dataloader — Creates the training DataLoader.
  • get_eval_dataloader — Creates the evaluation DataLoader.
  • get_test_dataloader — Creates the test DataLoader.
  • log — Logs information on the various objects watching training.
  • create_optimizer_and_scheduler — Sets up the optimizer and learning rate scheduler if they were not passed at init. Note, that you can also subclass or override the create_optimizer and create_scheduler methods separately.
  • create_optimizer — Sets up the optimizer if it wasn’t passed at init.
  • create_scheduler — Sets up the learning rate scheduler if it wasn’t passed at init.
  • compute_loss - Computes the loss on a batch of training inputs.
  • training_step — Performs a training step.
  • prediction_step — Performs an evaluation/test step.
  • evaluate — Runs an evaluation loop and returns metrics.
  • predict — Returns predictions (with metrics if labels are available) on a test set.

What I would recommend (and will do) is to copy at the original function and change just the minimum required to obtain the desired behaviour. I will try it out and then suggest to add an optional argument to evaluate on a random subsample of the evaluation set, as I think it might be useful for many.

Hope this helps.