Using Transformers with DistributedDataParallel — any examples?

Introduction for the Accelerate library says I have to be willing to write a forward loop (forgoing Trainer). Is there a way for me to enable DDP training while continuing using Trainer?

Replacing _get_train_sampler with _get_eval_sampler looks like a much more elegant solution, thank you!

1 Like