[trainer] 'train_loss' different from 'loss'

Hi all,

I am using the Trainer and training GPT2 from scratch. I have trained it for 50 epochs and during training I had logs like the one shown below:

{'loss': 6.513, 'learning_rate': 1.1749500646222535e-07, 'epoch': 49.99}

However, after the last epoch I get a log with some train metrics:


 ***** train metrics *****
epoch                    =       50.0
train_loss               =     0.0084
train_runtime            = 0:12:31.27
train_samples            =    8716143
train_samples_per_second = 580087.323
train_steps_per_second   =    566.434

Notice that these values are significantly different (6.5 vs. 0.0084). If the last loss is the real training loss, then what losses were the logs outputting during training?

Thanks

1 Like

Hi! I just had this exact question while doing my own training. Did you ever find out the answer?