Trainer: Save Checkpoint After Each Epoch

I am trying to fine-tune a model using Pytorch trainer, however, I couldn’t find an option to save checkpoint after each validation of each epoch.

I could only find “save_steps” which only save a checkpoint after specific steps, but I validatie the model at the end of each epoch, and I want to store the checkpoint at this point.

Any idea ?

Perhaps you could use the Trainer callback mechanism and register handler for on_epoch_end.

If you set the option load_best_model_at_end to True, the saves will be done at each evaluation (and the Trainer will reload the best model found during the fine-tuning).

2 Likes

Thanks for the tip.

Thanks a lot @sgugger.
This is exactly what I am looking for.