Streaming dataset and cache

For anyone else stumbling upon this thread, I’ve found that Pytorch implements the sort of caching I was talking about through the num_workers and pin_memory parameters of the dataloader. For the HF trainer, the parameter is named dataloader_num_workers.
If num_workers > 0, one or more process that are independent from the main one perform the processing while the forward/backward passes are happening. Nothing is stored on disk though, it relies on RAM