Hey cramraj8, I think that if you use the following in the training config
save_total_limits=2
save_strategy=”no”
then the best and the latest models will be saved. You can compare the checkpoint number of these two models and infer which one is the largest number to get the latest iteration essentially.
Alternatively, if you use load_best_model=True
in the config as well, and then do trainer.state.best_model_checkpoint
after training, you can get the best checkpoint number, and again from that you can infer that the other output directory contains the latest model.
This is not exact, but if you use save_strategy=steps
and save_steps=NUMBER
, it seems that the total number of steps done during training is approximately the number of steps defined in save_steps
multiplied with the batch size defined in per_device_train_batch_size
.