Bigger batch size, the lower throughput and GPU usage?

Hi, I’m having a similar issue with ViT but I am getting very inconsistent and low GPU usage. Did you figure this out? I think it is IO as well