CUDA out of memory when using Trainer with compute_metrics

When computing metrics inside the Trainer, your predictions are all gathered together on the device (GPU/TPU) and only passed back to the CPU at the end (because that operation can be slow). If your dataset is large (or your model outputs large predictions) you can use eval_accumulation_steps to set a number of steps after which your predictions are sent back to the CPU (slower but uses less device memory). This should avoid your OOM.

6 Likes