Hello,
This is a specific question on the behaviour of the Trainer
class, I hope someone is able to help. I am trying to plot the training loss after each step and therefore I have subclassed Trainer
and made a couple of edits at line 1772 (new lines commented with # ADDED LINE
) so the if statement now reads
step_losses = [] # ADDED LINE
if (
((step + 1) % args.gradient_accumulation_steps != 0)
and args.local_rank != -1
and args._no_sync_in_gradient_accumulation
):
# Avoid unnecessary DDP synchronization since there will be no backward pass on this example.
with model.no_sync():
tr_loss_step = self.training_step(model, inputs)
step_losses.append(tr_loss_step.item()) # ADDED LINE
else:
tr_loss_step = self.training_step(model, inputs)
step_losses.append(tr_loss_step.item()) # ADDED LINE
and at the end I write step_losses
to disk and plot them.
I have noticed that, regardless of the dataset I use, if I calculate a moving average of the loss (for example every 50 steps to remove noise), at the start of each epoch the loss sharply drops before stabilising. This is a typical graph without smoothing
and this is with smoothing (moving average)
On the horizontal axis is the number of steps (in this case 1932 steps per epoch, with 10 epochs showing). You can clearly see the drop at the start of each epoch.
My question is: is this behaviour normal? What is causing the moving average of the loss to drop so sharply at the start of each epoch? It happens with HuggingFace’s native Trainer
class so I would exclude any major code bug. Any help is much appreciated.