I am training huggingface longformer for a classification problem and got below output.
-
I am confused about
Total optimization steps
. As I have 7000 training data points and 5 epochs andTotal train batch size (w. parallel, distributed & accumulation) = 64
, shouldn’t I get
7000*5/64
steps? that comes to546.875
? why is it showingTotal optimization steps = 545
-
Why in the below output, there are 16 steps of
Input ids are automatically padded from 1500 to 1536 to be a multiple of config.attention_window: 512
then[ 23/545 14:24 < 5:58:16, 0.02 it/s, Epoch 0.20/5]
? what are these steps?
==========================================================
***** Running training *****
Num examples = 7000
Num Epochs = 5
Instantaneous batch size per device = 4
Total train batch size (w. parallel, distributed & accumulation) = 64
Gradient Accumulation steps = 16
Total optimization steps = 545
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
Initializing global attention on CLS token...
Input ids are automatically padded from 1500 to 1536 to be a multiple of `config.attention_window`: 512
[ 23/545 14:24 < 5:58:16, 0.02 it/s, Epoch 0.20/5]
Epoch Training Loss Validation Loss