Different results from checkpoint evaluation when loading fine-tuned LLM model

probably related: