Read CSV multi threading

Hi ! Currently the CSV loader doesn鈥檛 leverage multithreading nor multiprocessing.

This is something we are working on, see issue [load_dataset] shard and parallelize the process 路 Issue #2650 路 huggingface/datasets 路 GitHub which should allow to parallelize the conversion over multiple csv files.

However I鈥檓 not very familiar tools that allow to do multithreading on single files though. So if you have any idea/direction that could speed up the conversion of csv files to arrow, feel free to share it here :slight_smile:

1 Like