This model has much higher F1/EM scores than what is in its card when evaluated on the validation squad2 data. Any ideas why that is?
This model has much higher F1/EM scores than what is in its card when evaluated on the validation squad2 data. Any ideas why that is?