As far as I know, for Pytorch, RandomSampler can not be directly used in the distributed data parallel training since DistributedSampler
is desired (this link discusses the problem). I am wondering whether accelerator.prepare(dataloader)
handles the data split for multiple GPUs if I use the RandomSampler
, so that the sub-dataset on each device are exclusive.
2 Likes
You don’t have to worry about using a distributed sampler with Accelerate. Whatever your sampler is, Accelerate will automatically shard it for all processes.
2 Likes
That’s great! Thanks!