How to implement Trainer's 'group_by_length' in PyTorch?

I am working on an ASR project, where I use a model from HuggingFace ( wav2vec2 ). My goal for now is to move the training process to PyTorch, so I am trying to recreate everything that HuggingFace’s Trainer() class offers.

One of these utilities is the ability to group batches by length and combine this with dynamic padding (via a data collator). To be honest however, I am not sure how to even begin this in PyTorch.

The inputs in my case are 1-D arrays that represent the raw waveform of a .wav file. So before training I need to ensure that arrays of similar size will be batched together. Do I need to create a custom Dataloader class and alter it, so that every time it gives me batch sizes of lengths as close as possible?

An idea I had, was to somehow sort the data from shortest to longest (or the opposite), and each time extract batch_size samples from them. This way, the first batch will consist of samples with the biggest lengths, the second batch will have the second biggest lengths, etc.

Nevertheless, I am not sure how to approach this implementation. Any advice will be greatly appreciated.

Thanks in advance.