Batch size during training vs batch size during evaluation

I am confused about the difference between batch size during training versus batch size during evaluation. I am trying to measure how batch size influences the inference time(speed of prediction) of different NLP models after they have been trained using the Huggingface Trainer API. The code I used is below

def print_summary(result):
    print(f"Time: {result.metrics['test_runtime']:.2f}")
    print(f"Samples/second: {result.metrics['test_samples_per_second']:.2f}")
    print(f"Test Loss: {result.metrics['test_loss']:.2f}")
    print(f"Test Accuracy: {result.metrics['test_accuracy']:.2f}")
   
    print(result.metrics)
logging.set_verbosity_error()
training_args = TrainingArguments( output_dir="cat", per_device_train_batch_size=1,per_device_eval_batch_size=1,do_train = False,
        do_predict = True)
model = AutoModelForSequenceClassification.from_pretrained("distilrob")
trainer = Trainer(model=model, args=training_args,compute_metrics=compute_metrics)
result=trainer.predict(tokenized_datasets_test_distilrob)
print_summary(result)

My initial idea was that the per_device_train_batch_size would have no effect, since the training is actually done and I am now looking at the performance of the trained models, but its value does actually change the inference time. Why would it affect the inference time after the training is complete and what would be the correct set up if I want to measure the inference time (speed of prediction) as (academically) precisely as possible?

1 Like

I also see difference in the evaluation results, when running huggingface trainer, with different evaluation batch size [per_device_eval_batch_size]