How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

@cyt79 Have you solved this, I am encountering the same phenomenon as well?