Seeking Clarification: Model Evaluation - Train and Val loss

Hi there,

I’m unsure whether my model is performing well. My understanding was that if my model adjusts between the training and validation loss, it indicates no underfitting or overfitting. However, my professor disagrees and says that my model is not adequately trained, suggesting the graph itself is incorrect. I’m uncertain where the issue lies or what exactly is wrong. Can anyone please offer some guidance or assistance?

Thanks in advance.

1 Like

This is very model dependant.

For me a loss greater than 1 is usually not a good thing.

However, We should not rely of losses alone. You need metrics to properly evaluate model performance.

If you can share some details about your model, data and training scripts we will be able to assist better.

Hey, thanks for your response.

I’m currently in the process of training a masked language model, which can be seen as retraining the roberta-base model for a specific domain. Other research papers have also reported a loss in the range of 2.5 to 3, so I’m not overly concerned about how low the loss is.

However, I have some doubts about the shape of the train and val loss curve mentioned in the above comment. If it is indeed correct, how should I interpret it, providing the necessary justifications?

Additionally, the perplexity score of this model is 25, and I’ll be attaching the perplexity graph for reference. Furthermore, I’ve performed predictions on masked words for a few sentences, and the results appeared to be quite convincing.

In my experience people try to infer too much meaning from their loss function graphs. Sometimes they are glaringly bad. But often you need to see them in context to make any meaningful conclusions. (Your minima is 3.3 ish which is larger that to literatures 2.5-3)

I think perhaps what your supervisor is saying is that you could train it for longer to squeeze out a deeper minima. You could look at using an early stopping criteria to self identify when to stop training (for example if the loss doesn’t decrease for an epoch then stop). Also, have you experimented with hyperparameter tuning? Adjusting LR and such can often yield a loss curve that is steeper in the early epochs. cosine LR scheduling can also help as it comprises an exploration and exploitation phase. The former helping to avoid local minima.

Though this is just an educated guess. To really understand your professor I recommend that you ask them specific questions about their doubts. Why exactly do they feel as if your model is not adequately trained?

This level of communication will be helpful in creating a good relationship with your prof. Its entirely possible he is just opposing you so you will find the confidence to defend your work.