I’m using the Hugging Face Trainer for training, but my dataset is quite large, so I want each process to read its corresponding part of the dataset (I know this can be done through split_dataset_by_node
or by manually handling rank and world size). However, I noticed that the trainer uses accelerate.prepare()
, it wraps my DataLoader, causing it to still fetch data according to the rank. How can I resolve this issue?
Thank you in advance for your help!
1 Like