I finetuned the
google/mobilebert-uncased model on MRPC and got ~87% validation accuracy. When I evaluate on the
test split, I get only ~83%. In the mobileBERT paper, they also report something like 87%.
My question: Am I doing something wrong or are they using the validation split in the paper?