Datasets mapping slow down in the end

Hello! Can someone explain is it possible to fix dataset.map slowing down in the end with num_of_processes > 1? For example, I got dataset with 60 millions samples and 98% of this dataset processed in 30 minutes and processes like 20k samples per second but from 98% to 99% it took 11 hours and processes few samples per second. I tried to reduce num_of_processes but it didn’t help. I’ve read other topics about it and answers from @lhoestq but I didn’t find any relevant answer

1 Like