Does the Dataset instance have a "batched reduce" style method?

Hey I’d like to know that: does the Dataset instance have a “batched reduce” style method?

e.g. in each row of the dataset, I have a column named “num of person”. If I wanted to get the sum number of all the persons of all the rows, I need to loop at each row and add each row’s “num of person” to my final counting number.

So I want to know, if there is a batched-reduce method can achive the goal above? Since loop every row is kind of slow. And I know that HF dataset instance has a batch-map method already, equivlently, maybe it has a batch-reduce method as well?

Hi! Adding “batched reduce” has been attempted once in Add reduce function by AJDERS · Pull Request #5533 · huggingface/datasets · GitHub, but we decided not to merge it for the reasons mentioned in the PR. There, you can find a Colab that explains how to use Dataset.map to get the same result.