Can't use multi GPU in evaluation from Trainer

Why is it that when I use Trainer, multiple GPUs are used for training, but only one GPU is used for evaluation? When I compared the GPU usage for training and evaluation, I found that: only the memory of GPU-0 is increased, and only its GPU-util is not 0.
This causes per_device_eval_batch_size to be only 1 or it goes OOM. And causing the evaluation to be slow.

1 Like

Have the same exact issue. Have you come to any conclusions / managed to proceed?

Not yet :sob: