Evaluation Metrics are not matching with Shuffle = False

Hi,

I am training a binary image classification model. In my code, I tried to evaluate the model at each epoch. In the dataloader, for the valuation dataset, if I mention shuffle=True, I am getting the macro AUROC score to be greater than 0.5 and it starts to show an increasing trend as we reach higher epochs. But if I use the dataloader with shuffle=False, the macro AUROC score is very low i.e. 0.07 and it does not cross 0.10 value at the end of 100th epoch.

I tried to run the same code without using accelerate (with just a single GPU), at that time I can see the macro AUROC score to be normal i.e. 0.62 (for the same checkpoint, using accelerate gave <0.1).

I am using torchmetrics’ (torch.functional.classification.binary_auroc) for evaluation.

Does anyone encounter similar issue? I would be happy, if someone shares some insights.

EDIT:
My batch_size is 256 with total dataset_size of 300. I tried to print the auroc score at each enumeration:

batch AUROC - batch_size
0.0 - 256
0.6252874135971069 - 44

Somehow, the auroc score of first enumeration is returned as 0.0 and due to that my final auroc score is erroneous.

1 Like