For anyone else stumbling upon this thread, I’ve found that Pytorch implements the sort of caching I was talking about through the num_workers
and pin_memory
parameters of the dataloader. For the HF trainer, the parameter is named dataloader_num_workers
.
If num_workers > 0, one or more process that are independent from the main one perform the processing while the forward/backward passes are happening. Nothing is stored on disk though, it relies on RAM