Custom trainer evaluation function

Hi, so I am trying to override the Trainer evaluate() function with my own method. It runs, but the problem is that when I use multi-GPUs (8 in my case), it seems to split the eval dataset across the 8 GPUs but only report the metric calculated for the eval subset on 1 GPU.

My boilerplate code is shown in here: scratch/trainer_eval.py at master · kevinghst/scratch · GitHub

Can someone please help take a look and identify the underlying reason for this behavior?

Thanks,
Kevin