Bug in gradient accumulation training_step in huggingface Trainer?


Hello, when I use huggingface trainer and apply gradient accumulation to loss it is significantly bigger than when I not use it, so I checked the code but there is a confusion in line 3604 of Trainer class in training_step function, they multiple loss by gradient_accumulation_steps then when return they just device it by the same gradient_accumulation_steps so what is the purpose of this after all ? (my transformers version is 4.46.1)

2 Likes

The github version has been slightly changed…
I don’t know if it’s fixed or not.

I don’t think the part in question has changed…
I wonder what is going right…

when I update the dev version from GitHub directly, the loss doesn’t multiply with the number of gradient accumulations anymore, it seems like some dev has messed it up in the 4.46.1 version :sweat_smile: . But do you have any idea why when the custom loss function is provided we need to multiply the loss by the gradient accumulation_steps? Moreover, it is crazy that when I update to 4.47.0.dev0 version the training time is reduced from 14 hours to 5 hours, I really don’t know what is going on here :sweat_smile:

1 Like

No, I have no idea!:laughing:
I know I’ve never used the trainer function, but that’s not the point, this means that this is probably a tough issue.

I guess they are trying to fix…
The following commitments, for example, are just that.
But it’s probably not completely fixed yet. They might.
Even today, there are easy bugs here and there in various libraries that are routinely fixed. If information is not smoothly provided from the user side to the developer side, bugs will not be noticed and fixed…

But I don’t think we have an issue on this question yet…
I haven’t even had a github account for a week, so I can’t say for sure…

https://github.com/huggingface/transformers/pulls

1 Like