Read CSV multi threading

imthanhlv · July 21, 2021, 11:53am

The arrow file reading worked beaultifully.
In my case, I resized the tmpfs to 300GB and finally it fits my dataset ;__;

I will modify the script so that next time it will not load the raw dataset again but load the preprocessed instead using your suggestion load_from_disk

Topic		Replies	Views
Local dataset loading performance: HF's arrow vs torch.load 🤗Datasets	5	1191	November 24, 2024
Allow streaming of large datasets with image/audio 🤗Datasets	18	3964	May 30, 2022
Custom 20GB Arrow dataset very slow to train Beginners	1	74	March 20, 2025
Loading dataset from disk taking more time than expected 🤗Datasets	0	714	March 14, 2022
Extremely slow data loading of imagefolder 🤗Datasets	9	2467	January 4, 2024

Read CSV multi threading

Related topics