How to Add Validation Loss to run_squad.py?

I’m using the ‘bert-base-uncased’ as the model on SQuADv1.1 and have args.evaluate_during_training set to True. I tried adding "start_positions": batch[3], "end_positions": batch[4] into the evaluate method so that BertForQuestionAnswering returns total loss.

However, when I try to do that, I get cunn_ClassNLLCriterion_updateOutput_kernel(Dtype *, Dtype *, Dtype *, long *, Dtype *, int, int, int, int, long) [with Dtype = float, Acctype = float]: block: [0,0,0], thread: [0,0,0] Assertion t >= 0 && t < n_classes failed.

What might be the problem? The only difference is that I’m trying to get the loss output from the model by passing in the start_positions and end_positions similar to that during training for evaluation using the dev. dataset.

I noticed that in the dev dataset that it contains multiple possible answers. Is there a way to account for that in terms of validation loss or is there a better way to see if the model is overfitting?