I am confused about these three arguments, as explained here by @sgugger
save_steps doesn’t care about the best model, so if I set
save_steps=200, every 200 steps, there is a checkpoint (200, 400, 600, …) but every 100 steps we have an evaluation of our model (100, 200, 300, …). Now, if the evaluation in 300 is the best, it will not be saved and is lost.
But if we set
load_best_model_at_end=True and keep the
eval_steps will override the
save_steps because it will save a checkpoint every 100 steps so it could load the best model at the end.
Here is the question: If all I said is true, why when
load_best_model_at_end=True is set,
save_steps should be a round multiple of
eval_steps? It doesn’t make sense because when
True, the model doesn’t care about
save_steps and saves every