Save only best model in Trainer

Hey cramraj8, I think that if you use the following in the training config

save_total_limits=2
save_strategy=”no”

then the best and the latest models will be saved. You can compare the checkpoint number of these two models and infer which one is the largest number to get the latest iteration essentially.

Alternatively, if you use load_best_model=True in the config as well, and then do trainer.state.best_model_checkpoint after training, you can get the best checkpoint number, and again from that you can infer that the other output directory contains the latest model.

This is not exact, but if you use save_strategy=steps and save_steps=NUMBER, it seems that the total number of steps done during training is approximately the number of steps defined in save_steps multiplied with the batch size defined in per_device_train_batch_size.