Training loss increases suddenly at the beginning of each epoch

Hi all,
I’ve used Transformers library for a while and tried different models (BERT, BART, ViT, etc.) with provided examples. However, I found that training loss will always increases suddenly at the beginning of each epoch. The figure below is an example:
image

At first, I thought that is because the training dataset is not shuffled after each epoch. However, there is a related topic indicates that Trainer class should handle this for us.

Does any one experience such a problem? Any comment would be really appreciated!

Are you sure it’s not the validation loss logged wrongly?