TPU VM training - each process loads the dataset

I’m trying to get a custom model training working and I’m following “Hugging Face on PyTorch / XLA TPUs”. I can start the process just fine with the command from “Train Your Transformer on Cloud TPUs”.

However, I’m using a custom txt dataset of 16GB. It seems that each process loads the dataset independently and my disk space of 200GB gets filled fast, causing the process to fail.

Is there a way to keep only one instance?

One solution I’ve found was to first run with a single TPU node, which did the caching. A second run with all 8 TPU nodes activated had no problem, since the caching was there.