To my understanding, set_transform
should do transformations on the fly such that the gpu can immediately use if for training.
When I specify group_by_length=True
on the trainer, set_transform
no longer does lazy eval, it goes through the whole dataset – my hunch is that it needs to do all the transformations first to be able to group by length.
Is this behavior intended? I think the group_by_length
should only be limited to the batch size (or a smaller subset of the dataset) and not to the whole dataset