Unable to reproduce results claimed in research paper using MobileViT torch model

Hi,

Please find my code here : MobileViT-HuggingFace/inference.py at master · vivek-golani/MobileViT-HuggingFace · GitHub.
I am using the pretrained HuggingFace MobileViTForSemanticSegmentation model and calculating the meanIoU on VOCSegmentation 2012 from torchvision.datasets. I am using the ImageProcessor from HuggingFace itself. However, I am not getting the expected validation numbers claimed in the paper.
Claimed validation miou ~0.73 and miou I got = 0.58
Please let me know if I am missing anything here?

Thanks in advance!