What is the meaning of "steps" parameters?

laro1 · September 26, 2023, 10:29am

I’m not sure I understand the meaning of “step” in run_speech_recognition_seq2seq (Trainer)

what is the meaning of step ? what is the relation to epochs ?
when save_total_limit = 5,
does it mean that always the best (metric) 5 steps are saved ? or
the last 5 steps are saved (and they may not be the with the best metric) ?

dblakely · September 26, 2023, 10:39am

what is the meaning of step ? what is the relation to epochs ?

A “step” (also called “training step” or “optimization step”) is a single forward pass + backward pass through the model. The model takes in a batch of examples, computes the loss and gradients, and then updates the model’s parameters during the backward pass. That all happens during a single training step.

The relationship with an epoch is that an epoch is one pass through the full training set (the model has seen every training example 1 time). Suppose you have 8000 training examples and a batch size of 8. One epoch will consist of 1000 steps.

when save_total_limit = 5,
does it mean that always the best (metric) 5 steps are saved ? or
the last 5 steps are saved (and they may not be the with the best metric) ?

Quick comment on terminology - the word you’re looking for here is “checkpoints” not steps

To answer the question, there’s documentation for this is here. They say:

When load_best_model_at_end is enabled, the “best” checkpoint according to metric_for_best_model will always be retained in addition to the most recent ones. For example, for save_total_limit=5 and load_best_model_at_end, the four last checkpoints will always be retained alongside the best model.

So you wanna set load_best_model_at_end=True in the TrainingArguments and HF will keep the best checkpoint along with the most recent ones.

Topic		Replies	Views
Behaviour of load_best_model_at_end when save_steps is not a multiple of max_steps Beginners	1	322	December 6, 2022
What is "steps" in TrainingArguments Beginners	2	1949	May 9, 2022
Choosing save_steps value and getting the best checkpoint 🤗Transformers	0	239	December 28, 2023
Saving only the best performing checkpoint 🤗Transformers	19	18209	May 23, 2023
Why save_steps should be a round multiple of eval_steps when load_best_model_at_end=True? 🤗Transformers	3	3679	October 18, 2021

What is the meaning of "steps" parameters?

Related topics