What split to use for MRPC: mobileBERT on GLUE

Hi,

I finetuned the google/mobilebert-uncased model on MRPC and got ~87% validation accuracy. When I evaluate on the test split, I get only ~83%. In the mobileBERT paper, they also report something like 87%.
My question: Am I doing something wrong or are they using the validation split in the paper?