Question about validation and testing loss

Just double checking my numbers here. Would these numbers for testing and validation be indicative of overfitting? I feel like I might be on the verge but would very much appreciate a second opinion.


Training Loss Epoch Step Validation Loss Precision Accuracy F1
0.5894 0.4 500 0.4710 0.8381 0.7747 0.7584
0.3863 0.8 1000 0.3000 0.8226 0.8737 0.8858
0.2272 1.2 1500 0.1973 0.9593 0.9333 0.9329
0.1639 1.6 2000 0.1694 0.9067 0.9367 0.9403
0.1263 2.0 2500 0.1128 0.9657 0.9597 0.9603
0.0753 2.4 3000 0.1305 0.9614 0.967 0.9679
0.0619 2.8 3500 0.1246 0.9633 0.9697 0.9705

More than validation loss this might be the case where there is an overlap in the training and validation data.

Go through the validation data carefully and check if the data has somehow crept into the training data.

Thanks for commenting and it’s nice to meet you! I’m positive my testing/validation data were defined. I augmented my data set with back-translation. I removed my duplicates but maybe that could be it? Would you mind explaining your thoughts a bit more?


Ok. If you are using data augmentation then you need to make sure two things.

  1. Do the augmentation after the train test split. That way you make sure that train data does not accidentally creep into the test set.
  2. Only run the augmentation on the training data.

Here is a reference to this python - Data augmentation before splitting - Stack Overflow.

Now that I think about it this can mean overfitting as well. The validation results look too good. You can try one more thing which is increasing the epochs. If the F1 still keeps on decreasing that would mean a high chance of overfitting. Please relook into the way you are using the augmented data.

Ok great! The accuracy and f1 both stagnate after 3 epochs. I’ll read up. Thanks for the ideas :slight_smile:

Just fyi: we typically make a distinction between a test set and a validation set. They are not the same thing. A validation (or development) set is used during training to intermediately evaluate the model, to probe the current performance every x steps/epochs. This way we can quickly see if something is wrong (data pollution, overfitting). A test set is a hold-out set that we only ever use after training (not during training, not even for evaluation, also not for cross-validation training or hyperparameter search). It should be explicitly be held out to ensure that the trained model is not biased towards it at all.

Tl;dr: validation/dev set: to monitor training; test set: hold-out for final evaluation (yes, it is confusing)

Thanks for mentioning it. It was a typing accident lol. Nonetheless, I appreciate your explanation :slight_smile: