Loading custom dataset without cache - using load script

I need to use IITCDIP dataset on a server where I get 6 hours time slot. The data_loader is taking more than 6 hours for caching. So, I can do my training. Is there any way:

  1. load custom dataset with caching (Stream) using script similar to here.

  2. Resume the caching process

  3. Cache dataset on one system and use on other system.

Note that I have tried up to 64 num_proc but did not get any speed up in caching processing.

You can first prepare your dataset on one system, then use my_dataset.save_to_disk() to save it in the directory of your choice. Then you can move this directory on another system and reload the dataset with datasets.load_from_disk()

Hope that helps !