What do you mean by “It starts at 0?
The progress bar starts at 0 not at the saved number of steps.
which version of transformers are you using?
I’m using version 3.3.1
For example I had trained the model until it reached step number 48000 which took around 5 hours, when I loaded this checkpoint as the snippet above it printed this output.
**** Running training *****
Num examples = 66687128
Num Epochs = 10
Instantaneous batch size per device = 32
Total train batch size (w. parallel, distributed & accumulation) = 32
Gradient Accumulation steps = 1
Total optimization steps = 20839730
Continuing training from checkpoint, will skip to saved global_step
Continuing training from epoch 0
Continuing training from global step 48000
Continuing training from 0 non-embedding floating-point operations
Will skip the first 48000 steps in the first epoch
but the progress bar started at 0 and it took another 5 hours until it reached the 48000 step again and then it started logging and saving.