Load from checkpoint not skipping steps

Rashwan · October 15, 2020, 11:43pm

What do you mean by “It starts at 0?

The progress bar starts at 0 not at the saved number of steps.

which version of transformers are you using?

I’m using version 3.3.1

For example I had trained the model until it reached step number 48000 which took around 5 hours, when I loaded this checkpoint as the snippet above it printed this output.

**** Running training *****
Num examples = 66687128
Num Epochs = 10
Instantaneous batch size per device = 32
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 1
Total optimization steps = 20839730
Continuing training from checkpoint, will skip to saved global_step
Continuing training from epoch 0
Continuing training from global step 48000
Continuing training from 0 non-embedding floating-point operations
Will skip the first 48000 steps in the first epoch

but the progress bar started at 0 and it took another 5 hours until it reached the 48000 step again and then it started logging and saving.

Topic		Replies	Views
No skipping steps after loading from checkpoint 🤗Transformers	16	7537	April 21, 2022
Resume_from_checkpoint Models	1	2349	June 25, 2024
Resume training from checkpoint Beginners	1	3034	January 5, 2023
Loading model from checkpoint after error in training Beginners	9	41613	May 2, 2024
Continuing Pre Training from Model Checkpoint Models	12	42098	January 13, 2025

Load from checkpoint not skipping steps

Related topics