Efficient bucketing implementation

Hi @jordiae ,

These articles on building a PyTorch Text Bucket Iterator with sequences of similar length and dynamic padding may be useful:

Here are other sources with a similar use case as well:

You can also sort and filter sequences by length and then use the .map() function to pad or truncate the rest of the batches.

Hopefully one of these will meet the criterion.

2 Likes