Roberta Pre-training models being inconsistent across epochs

I am pre-training Roberta-Large with Masked Language Modelling, and then fine-tuning it for a different downstream task. I have noticed that for different checkpoints of the pre-trained model, the model’s performance is inconsistent, on this down-stream task. For example, when I take the pre-trained model after training for 3,5,7 epochs respectively, the performance of the model improves between the 3rd and 7th epoch checkpoint, but is considerably worse for the 5th checkpoint.
All parameters during fine-tuning remain the same. What could be a possible reason?