Choosing save_steps value and getting the best checkpoint

I’m running finetune for ASR model using Seq2SeqTrainer and Seq2SeqTrainingArguments


training_args = Seq2SeqTrainingArguments(
    output_dir                   = "./output_resuts", 
    overwrite_output_dir         = True,
    do_train                     = True,
    do_eval                      = True,    
    per_device_train_batch_size  = 2,
    gradient_accumulation_steps  = 4,
    per_device_eval_batch_size   = 8,    
    learning_rate                = 1e-5,
    warmup_steps                 = 1,    
    save_total_limit             = 3,       
    evaluation_strategy          = "epoch",
    save_strategy                = "epoch",
    logging_strategy             = "epoch",    
    num_train_epochs             = 5,       
    gradient_checkpointing       = True,
    fp16                         = True,    
    predict_with_generate        = True,
    generation_max_length        = 225,           
    report_to                    = ["tensorboard"],
    load_best_model_at_end       = True,
    metric_for_best_model        = "wer",
    greater_is_better            = False,
    push_to_hub                  = False,
)

Preformatted text

`

After running the trainer.train() I’m getting:

image

With 3 checkpoints folders:

image

It seems that epoch number 3 has the best metric (lowest wer).

Now I want to load the best model (i.e load the checkpoints with the best metric (lowest wer).

  1. How can I know which of the folders checkpoints has this model with the lowest wer ?
  2. Is there a way to get the checkpoint name from the trainer ?