Why In Question Answering Dataset we dont used with testdataset?

In Question Answering task i read this text: “Preprocessing the validation data will be slightly easier as we don’t need to generate labels
(unless we want to compute a validation loss, but that number won’t really help us understand how good the model is).”
1.Why validation loss number won’t really help us?
2.Why we dont split dataset to 3 :train,test,validation?