Different loss values for trained and saved model

I’m training a custom BERT-like model via Trainer. Whenever I load a checkpoint the model shows results barely distinguishable from random prediction (~9.5 loss versus 10.3 for random), while during training the model reaches validation loss 2.5. I have tried both to leave “greater_is_better” to default and set to False, but it didn’t seem to have an effect. The validation dataset is custom and randomly shuffled; however, in the debug setup (small batch size, short eval periods) the checkpoint losses seem to correspond to model ones.