What is the meaning of "steps" parameters?

I’m not sure I understand the meaning of “step” in run_speech_recognition_seq2seq (Trainer)

  1. what is the meaning of step ? what is the relation to epochs ?
  2. when save_total_limit = 5,
    does it mean that always the best (metric) 5 steps are saved ? or
    the last 5 steps are saved (and they may not be the with the best metric) ?
  1. what is the meaning of step ? what is the relation to epochs ?

A “step” (also called “training step” or “optimization step”) is a single forward pass + backward pass through the model. The model takes in a batch of examples, computes the loss and gradients, and then updates the model’s parameters during the backward pass. That all happens during a single training step.

The relationship with an epoch is that an epoch is one pass through the full training set (the model has seen every training example 1 time). Suppose you have 8000 training examples and a batch size of 8. One epoch will consist of 1000 steps.

  1. when save_total_limit = 5,
    does it mean that always the best (metric) 5 steps are saved ? or
    the last 5 steps are saved (and they may not be the with the best metric) ?

Quick comment on terminology - the word you’re looking for here is “checkpoints” not steps :slight_smile:

To answer the question, there’s documentation for this is here. They say:

When load_best_model_at_end is enabled, the “best” checkpoint according to metric_for_best_model will always be retained in addition to the most recent ones. For example, for save_total_limit=5 and load_best_model_at_end, the four last checkpoints will always be retained alongside the best model.

So you wanna set load_best_model_at_end=True in the TrainingArguments and HF will keep the best checkpoint along with the most recent ones.