As mentioned, the custom sampler will be used. This new sampler is simply just distributing all of the batch across your GPUs. So it goes old sampler → new sampler to dispatch.
To test this, you can try including a print
statement in your custom sampler and iterate after prepare