Using DistributedSampler with accelerate

I want to run CustomSFTTrainer (inherits SFTTrainer which inturn inherits Trainer class) on a multi-GPU setup using accelerate. I understand that the Trainer class already uses accelerate and hence appropriately creates a dataloader and calls accelerate.prepare(dataloader) in its train method.

However, I fail to understand if it uses DistributedSampler. I noticed that it uses only RandomSampler and accelerate inturn calls SeedableRandomSampler and not a DistributedSampler. I want to run the model on different GPUs with exclusive unique chunks of data so that the training is faster.

How do I use DistrubutedSampler with accelerate and the inbuilt Trainer class?

1 Like

There may be no advantage to explicitly using DistributedSampler…

You don’t have to worry about using a distributed sampler with Accelerate. Whatever your sampler is, Accelerate will automatically shard it for all processes.

I see. So, just to be clear, Accelerate will ensure that, given any sampler, the data will be split exclusively for each GPU? Interesting, because I wasn’t able to find this functionality in the prepare_dataloader method of the Accelerate function. Is it wrapped in any other Accelerate method?

1 Like

It’s hard to tell what’s where in the code of the library in charge of optimization…
There’s no example that directly mentions the mechanism.

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.