Why is it that when I use Trainer, multiple GPUs are used for training, but only one GPU is used for evaluation? When I compared the GPU usage for training and evaluation, I found that: only the memory of GPU-0 is increased, and only its GPU-util is not 0.
This causes per_device_eval_batch_size
to be only 1 or it goes OOM. And causing the evaluation to be slow.
1 Like
Have the same exact issue. Have you come to any conclusions / managed to proceed?
Not yet