I am confused about these three arguments, as explained here by @sgugger save_steps
doesn’t care about the best model, so if I set eval_steps=100
and save_steps=200
, every 200 steps, there is a checkpoint (200, 400, 600, …) but every 100 steps we have an evaluation of our model (100, 200, 300, …). Now, if the evaluation in 300 is the best, it will not be saved and is lost.
But if we set load_best_model_at_end=True
and keep the eval_steps=100
, save_steps=200
, eval_steps
will override the save_steps
because it will save a checkpoint every 100 steps so it could load the best model at the end.
Here is the question: If all I said is true, why when load_best_model_at_end=True
is set, save_steps
should be a round multiple of eval_steps
? It doesn’t make sense because when load_best_model_at_end
is True
, the model doesn’t care about save_steps
and saves every eval_steps
.