Can't use multi GPU in evaluation from Trainer

Why is it that when I use Trainer, multiple GPUs are used for training, but only one GPU is used for evaluation? When I compared the GPU usage for training and evaluation, I found that: only the memory of GPU-0 is increased, and only its GPU-util is not 0.
This causes per_device_eval_batch_size to be only 1 or it goes OOM. And causing the evaluation to be slow.


Have the same exact issue. Have you come to any conclusions / managed to proceed?

Not yet :sob:

Any update on this? I am getting OOM for the same thing, only cuda:0 is being used…

I have heard about DataParallel or DistributedDataParallel, but this appears to require pretty extensive refactoring of my training script…