CUDA out of memory when using Trainer with compute_metrics

sgugger · December 23, 2020, 5:51pm

When computing metrics inside the Trainer, your predictions are all gathered together on the device (GPU/TPU) and only passed back to the CPU at the end (because that operation can be slow). If your dataset is large (or your model outputs large predictions) you can use eval_accumulation_steps to set a number of steps after which your predictions are sent back to the CPU (slower but uses less device memory). This should avoid your OOM.

Topic		Replies	Views
Transformer Trainer no response when evaluate with compute_metrics 🤗Transformers	1	154	September 12, 2024
Cuda out of memory during evaluation but training is fine 🤗Transformers	12	17246	February 20, 2025
Adding compute_metrics produces Cuda OutOfMemoryError Beginners	0	125	May 22, 2024
Cuda out of memory while using Trainer API Beginners	1	1760	October 20, 2021
CUDA Out Of Memory when training a DETR Object detection model with compute_metrics 🤗Transformers	0	94	November 9, 2024

CUDA out of memory when using Trainer with compute_metrics

Related topics