I’ve used Transformers library for a while and tried different models (BERT, BART, ViT, etc.) with provided examples. However, I found that training loss will always increases suddenly at the beginning of each epoch. The figure below is an example:
At first, I thought that is because the training dataset is not shuffled after each epoch. However, there is a related topic indicates that Trainer class should handle this for us.
Does any one experience such a problem? Any comment would be really appreciated!