Can someone explain how batches are created when both packing and group_by_length are enabled? As far as I understand, packing involves combining examples to create a sequence of maximum length, while group_by_length works to minimize the use of padding tokens by combining shorter sequences. It seems to me that enabling packing might interfere with how group_by_length functions. I’m not sure about the actual behaviour when both settings are active. Does anyone have insights on this?