CUDA out of memory only during validation not training

sgugger · May 26, 2022, 11:33am

Here are a few things:

Make sure your model only returns logits and not extra tensors (as everything is accumulated on the GPU)
Use eval_accumulation_steps to regularly offload the predictions on the GPU to the CPU (slower but will avoid this OOM error).

Topic		Replies	Views
Cuda out of memory during evaluation but training is fine 🤗Transformers	12	17399	February 20, 2025
CUDA out of memory error while predicting (evaluation) 🤗Transformers	1	1420	March 22, 2024
torch.cuda.OutOfMemoryError when evaluate while traning 🤗Transformers	0	515	October 8, 2023
Evaluation error: CUDA out of memory 🤗Transformers	0	728	August 22, 2022
Trainer will evaluate using my entire validation set(60k), which gives me cuda memory usage issue:. Is there a param that allows evaluating only on some batches in validation set? Beginners	4	1334	April 29, 2022