How does Accelerate ensure uniqueness of data samples across GPUs?

Hi,

I have the following questions about the inner workings of Accelerate. If there is an existing document answering these, please link to it.

  1. When using DDP, how does Accelerate sample the data from the dataloader such that one data instance is used exactly once across all the GPUs (i.e. batches used per GPU don’t share examples)? Is there any change in this behaviour when loading from an IterableDataset?

  2. When passing the same seed to accelerate.utils.set_seed() function, is reproducibility guaranteed for DDP training runs? That is, is k-th batch on each GPU same across different training runs?

I am using an IterableDataset for training in a DDP setting and want to ensure the reproducibility of the training runs. If it does not come out of the box with Accelerate, please guide me on how to achieve the same.

Thanks.

Check out our sampler, which splits up the data as we grab it: https://github.com/huggingface/accelerate/blob/main/src/accelerate/data_loader.py (see BatchSamplerShard and IterableDatasetShard)

Yes.

Thanks for your reply and for sharing the reference. To confirm my understanding let me provide an example.

Assume two setups:
(1) DDP on 2 GPUs, and a per_gpu_batch_size of 16
(2) Training on a single GPU with a batch size of 32

It seems that what the accelerate samplers do for setup (1) is equivalent to setup (2). The sampler just splits the data across GPUs. So, the examples on which the model gets trained are the same in both setups. Is that correct?

However, I tried these two setups and the DDP case gives a higher loss than the single GPU setup. What could be the reason for that?