Some question about training_step Function in Class Trainer

I’m analyzing the Class Trainer source code and have a question regarding the training_step function. In this function, it appears that the backward pass is performed before the function returns the loss. Specifically, after calculating the loss, the code calls scaled_loss.backward() (or self.accelerator.backward(loss)) before returning loss.detach() / self.args.gradient_accumulation_steps.

Does this mean that the function first performs backpropagation on the computed loss and then divides the loss by self.args.gradient_accumulation_steps before returning it? How does this fit into the gradient accumulation strategy?

The source code is very short. I hope you can help me take a look.:face_holding_back_tears:
Thanks for your help!
Link to the source code of the training_step function.

1 Like