I am fine-tuning a Llama2-7b-hf model on my custom dataset. However, the train and eval loss is different any time a re-run the training with the HuggingFace Trainer. I set the seed prior model training using the set_seed function and also passed the seed as arg to the Trainer.
I tested the same code with the Mistral model and could not observe similar behavior. Any idea what can cause this difference?