Behaviour of load_best_model_at_end when save_steps is not a multiple of max_steps

joshdot7 · December 5, 2022, 10:15pm

I note that max_steps is meant to be a multiple of save_steps so that a checkpoint is saved at the end of training. I just wanted to confirm that for e.g. max_steps = 150, save_strategy='steps', save_steps=100 load_best_model_at_end=True the final 50 steps of training would essentially be thrown away, as there’s no checkpoint saved at that point and the ‘best’ model that’s reloaded (i.e. checkpoint-100) will just write-over the model state after 150 steps (i.e. save_model just saves checkpoint-100’s weights). Thanks!

sgugger · December 6, 2022, 1:52pm

That’s very true! Do you want to add a warning to the doc?

Topic		Replies	Views
Why save_steps should be a round multiple of eval_steps when load_best_model_at_end=True? 🤗Transformers	3	3684	October 18, 2021
Question Regarding trainer arguments:: load_best_model_at_end Beginners	2	1949	April 19, 2021
Loading a model from local with best checkpoint Beginners	10	32411	September 24, 2023
Do trainer.save_model saves the best model? 🤗Transformers	3	6369	July 3, 2023
Saving only the best performing checkpoint 🤗Transformers	19	18209	May 23, 2023

Behaviour of load_best_model_at_end when save_steps is not a multiple of max_steps

Related topics