This is a specific question on the behaviour of the
Trainer class, I hope someone is able to help. I am trying to plot the training loss after each step and therefore I have subclassed
Trainer and made a couple of edits at line 1772 (new lines commented with
# ADDED LINE) so the if statement now reads
step_losses =  # ADDED LINE if ( ((step + 1) % args.gradient_accumulation_steps != 0) and args.local_rank != -1 and args._no_sync_in_gradient_accumulation ): # Avoid unnecessary DDP synchronization since there will be no backward pass on this example. with model.no_sync(): tr_loss_step = self.training_step(model, inputs) step_losses.append(tr_loss_step.item()) # ADDED LINE else: tr_loss_step = self.training_step(model, inputs) step_losses.append(tr_loss_step.item()) # ADDED LINE
and at the end I write
step_losses to disk and plot them.
I have noticed that, regardless of the dataset I use, if I calculate a moving average of the loss (for example every 50 steps to remove noise), at the start of each epoch the loss sharply drops before stabilising. This is a typical graph without smoothing
and this is with smoothing (moving average)
On the horizontal axis is the number of steps (in this case 1932 steps per epoch, with 10 epochs showing). You can clearly see the drop at the start of each epoch.
My question is: is this behaviour normal? What is causing the moving average of the loss to drop so sharply at the start of each epoch? It happens with HuggingFace’s native
Trainer class so I would exclude any major code bug. Any help is much appreciated.