Set_transform and group_by_length=True

To my understanding, set_transform should do transformations on the fly such that the gpu can immediately use if for training.

When I specify group_by_length=True on the trainer, set_transform no longer does lazy eval, it goes through the whole dataset – my hunch is that it needs to do all the transformations first to be able to group by length.

Is this behavior intended? I think the group_by_length should only be limited to the batch size (or a smaller subset of the dataset) and not to the whole dataset

No, group_by_lengths need to read all the lengths of the dataset to be able to build batches of similar lengths.

Is there a way to narrow down the group_by_length to smaller subsets - without the need to shard dataset?

No, this is not implemented.