Has anyone noticed different behavior when using group by length vs not using it? Does the unbalanced length cause worse optimizer steps and come at the price of having more ineffective training?
1 Like
Had a similar issue, my training loss would spike at intervals of exactly 50 steps. I was going crazy until finally figured out that group_by_length
may cause issues like mine
The solution is to set group_by_length=False
1 Like
Same here. Had to disable group by length because my training loss was jumping up and down and evaluation loss was not dropping, I think it’s because there is a huge variance in the maximum length of my sequences (I’m training a longformer) so each batch had very different sequence lengths.
1 Like