Thank you @lhoestq
The arrow file reading worked beaultifully.
In my case, I resized the tmpfs to 300GB and finally it fits my dataset ;__;
I will modify the script so that next time it will not load the raw dataset again but load the preprocessed instead using your suggestion load_from_disk