Deepset's bert cased trained on squad2 question

This model has much higher F1/EM when evaluated on the validation squad2 data set than what is in the model card. Is the card outdated?