[trainer] 'train_loss' different from 'loss'

katze · January 5, 2022, 7:49pm

Hi all,

I am using the Trainer and training GPT2 from scratch. I have trained it for 50 epochs and during training I had logs like the one shown below:

{'loss': 6.513, 'learning_rate': 1.1749500646222535e-07, 'epoch': 49.99}

However, after the last epoch I get a log with some train metrics:


 ***** train metrics *****
epoch                    =       50.0
train_loss               =     0.0084
train_runtime            = 0:12:31.27
train_samples            =    8716143
train_samples_per_second = 580087.323
train_steps_per_second   =    566.434

Notice that these values are significantly different (6.5 vs. 0.0084). If the last loss is the real training loss, then what losses were the logs outputting during training?

Thanks

asfhgj · June 29, 2022, 4:35pm

Hi! I just had this exact question while doing my own training. Did you ever find out the answer?

Daromog · August 18, 2022, 10:50pm

I have the same problem , did you find the answer???

OrTag · March 26, 2023, 5:09pm

For future people coming with the same question:
The final result is the average of all the losses.

pyjhzwh · March 31, 2023, 1:02am

According to trainer.py the ‘train_loss’ in the metric is the average loss across all steps. The ‘loss’ at each logging step is the average loss from the previous logging step to current logging step.
The significant difference of ‘train_loss’ and ‘loss’ is probably because you resume training from a checkpoint where the “self._total_loss_scalar” is cleared (not stored in the checkpoint), but the self.state.global_step is still correct. Therefore, using
train_loss = self._total_loss_scalar / self.state.global_step to gettrain_loss resuming from a checkpoint will give you a value (e.g. 0.0084) way less than the real average loss.

Topic		Replies	Views
Different loss values during training Beginners	0	213	September 19, 2023
Is the reported loss averaged over logging steps 🤗Transformers	2	600	May 17, 2022
Trainer doesn't show the loss at each step 🤗Transformers	20	35323	May 9, 2024
Trainer log my custom metrics at training step Beginners	3	3978	July 11, 2024
Key Error 'loss' while fine tuning GPT-2 with the Trainer utility 🤗Transformers	9	7465	May 10, 2022

[trainer] 'train_loss' different from 'loss'

Related topics