I’m trying to get a custom model training working and I’m following “Hugging Face on PyTorch / XLA TPUs”. I can start the process just fine with the command from “Train Your Transformer on Cloud TPUs”.
However, I’m using a custom txt dataset of 16GB. It seems that each process loads the dataset independently and my disk space of 200GB gets filled fast, causing the process to fail.
Is there a way to keep only one instance?