@sgugger This is completely off topic but do you think we could implement grouping by length inside a pipeline to prevent slowdowns due to large differences in sequence lengths? This would only be implemented for users that run the pipeline
on a Dataset
object. I’d be happy to contribute this. What would be an appropriate forum to discuss details?