Accelerate - WeightedRandomSampler Dataloader

Hi !
Like you mention in this post :
Accelerator .prepare() replaces custom DataLoader Sampler - :hugs:Accelerate - Hugging Face Forums

When we use a custom sampler, it is used on downstream processes.

But today it works like that :


Sampler will generate batches, and batches will be assigned to different processes depending on their indexes.

When we have a weighted sampler, It’s probable to get the same datarows in multiple batches. It means we can have the same data on multiple processes.

I would like to make sure, each process uses independant data…

Is it currently possible ?
If yes how ?
If no, how should i implement it ?

I created a script in order to reproduce the issue, it is here :
Dataloader WeightedRandomSampler + Distributed Training · Issue #2865 · huggingface/accelerate (github.com)

Thanks for your help and feedback

In the end, what i would like is more a behavior like that :