Has anyone noticed different behavior when using group by length vs not using it? Does the unbalanced length cause worse optimizer steps and come at the price of having more ineffective training?
Has anyone noticed different behavior when using group by length vs not using it? Does the unbalanced length cause worse optimizer steps and come at the price of having more ineffective training?