Batch size in trainer eval loop

I am new to huggingface trainer. I tried to use hf trainer on t5. It looks to me that the training phase uses all GPUs while in evaluation phase, I sometimes see per_device_batch_size * num_gpus as the batch size and in other times, I see only per_device_batch_size as batch size.
Can someone shed light on if multiple gpus are used during evaluation and whats the effective batch size during evaluation?
Also, what is the point of using eval_grad_accumulation_steps (argument to trainer) during eval? There is no model update based on eval dataset…right?

The evaluation will use all GPUs like the training, so the effective batch size will be the per_device_batch_size multiplied by the number of GPUs (it’s logged at the beginning of the evaluation).

Where exactly did you find eval_grad_accumulation_steps, I don’t see this anywhere in the Transformers code base.

The exact TrainingArgument name is eval_accumulation_steps. I misunderstood that it is about gradient accumulation. I guess it is about maximum examples to predict before moving the predictions onto CPU. But then wondering, why is eval_accumulation_steps important? We could control the number of examples in GPU using per_device_eval_batch_size…right?


No, the examples are only passed to the CPU every eval_accumulation_steps if provided, not at every step. And when this arument is left unset, they are passed to the CPU at the end of the whole evaluation loop. It’s way faster to only do this transfer once, and most of the time the predictions of the model don’t take much GPU space.

1 Like