Evaluation loss depends on batch size

I train a token classification model on a private dataset.
I noticed that the evaluation loss has different values if I change the value of per_device_eval_batch_size.
Is it a known thing?
The total test set size is not divisible with the batch size (per_device_eval_batch_size*gpu_num).
When I put a batch size that divides the total test set size, I still get a different number from what I get when I calculate the loss directly with torch.nn.CrossEntropyLoss.
Also, accuracy, recall, etc do not change with the batch size.
Thanks for you help!

1 Like

Apparently a known but neglected issue for several years…