Evaluate subset of data during training

Hi all,

I’m using the run_mlm.py script.

My evaluation set is “too large” (i.e., takes too long to run through the entire evaluation set, every n steps of training); so I was hoping to sub-sample from the evaluation set during training.

Is there a simple way to extend or use the Trainer with custom logic for sub-sampling examples from the provided eval_dataset? In the worst case, I can manually specify a subset of the eval set to be fed into the Trainer, but I was hoping to do a random subsample for each in-training evaluation so that I don’t overfit to one sub-sample of the evaluation set.

Thanks!

Why do you not give a smaller evaluation dataset? You can then run trainer.evaluate(full_eval_dataset) to evaluate on the full validation dataset.

Hi,

As Sahil said, he does not want to evaluate always on the same (small) set of data to avoid overfitting on the specific sub-sample. I agree, and I am facing the same issue.

As explained here, to inject custom behaviour you can subclass them and override the following methods:

  • get_train_dataloader — Creates the training DataLoader.
  • get_eval_dataloader — Creates the evaluation DataLoader.
  • get_test_dataloader — Creates the test DataLoader.
  • log — Logs information on the various objects watching training.
  • create_optimizer_and_scheduler — Sets up the optimizer and learning rate scheduler if they were not passed at init. Note, that you can also subclass or override the create_optimizer and create_scheduler methods separately.
  • create_optimizer — Sets up the optimizer if it wasn’t passed at init.
  • create_scheduler — Sets up the learning rate scheduler if it wasn’t passed at init.
  • compute_loss - Computes the loss on a batch of training inputs.
  • training_step — Performs a training step.
  • prediction_step — Performs an evaluation/test step.
  • evaluate — Runs an evaluation loop and returns metrics.
  • predict — Returns predictions (with metrics if labels are available) on a test set.

What I would recommend (and will do) is to copy at the original function and change just the minimum required to obtain the desired behaviour. I will try it out and then suggest to add an optional argument to evaluate on a random subsample of the evaluation set, as I think it might be useful for many.

Hope this helps.

Best,
Emilio