Why accuracy of finetune model is less when evaluated after loading from disk, than during training?

SyedaSadaf · October 31, 2022, 6:05am

I am finetuning a transformer model and during the training cycle, evaluating it at each epoch. The best model is selected based on the highest evaluation accuracy among all epochs. Once the training cycle is completed and the best model is dumped to the disk, I try to regenerate that validation accuracy. I am unable to regenerate the exact validation accuracy reported by the training phase. I am getting a 3% to 4% drop in accuracy on the same evaluation data.

(For regeneration, I am calling the same evaluation function and passing it model and dataset. Nothing else changed for evaluation accuracy regeneration)

Topic		Replies	Views
Inconsistent evaluation result Beginners	0	22	October 23, 2024
Different results from checkpoint evaluation when loading fine-tuned LLM model Intermediate	5	3246	September 22, 2023
Accuracy decreasing after saving/reloading my model 🤗Transformers	3	10	July 8, 2025
Evaluating pretrained model Beginners	0	308	July 26, 2021
Different accuracy values 🤗AutoTrain	0	21	October 12, 2024

Why accuracy of finetune model is less when evaluated after loading from disk, than during training?

Related topics