Grouping by length makes training loss oscillate and makes evaluation loss worse

ssharpe42 · April 18, 2024, 7:52pm

Has anyone noticed different behavior when using group by length vs not using it? Does the unbalanced length cause worse optimizer steps and come at the price of having more ineffective training?

javier-cohere · May 16, 2025, 5:28pm

Had a similar issue, my training loss would spike at intervals of exactly 50 steps. I was going crazy until finally figured out that group_by_length may cause issues like mine

The solution is to set group_by_length=False

GuyShur · June 3, 2025, 5:54pm

Same here. Had to disable group by length because my training loss was jumping up and down and evaluation loss was not dropping, I think it’s because there is a huge variance in the maximum length of my sequences (I’m training a longformer) so each batch had very different sequence lengths.

Topic		Replies	Views
Trainer being very slow to init training setting group_by_length to True 🤗Transformers	1	300	February 1, 2025
Set_transform and group_by_length=True 🤗Datasets	3	3175	June 10, 2021
Issue with batching long sequences Beginners	0	7	July 16, 2024
How to implement Trainer's 'group_by_length' in PyTorch? Beginners	1	1779	September 25, 2023
Problems and solution on Trainer Beginners	3	794	December 17, 2021

Grouping by length makes training loss oscillate and makes evaluation loss worse

Related topics