How to run an end to end example of distributed data parallel with hugging face's trainer api (ideally on a single node multiple gpus)?

however, torchrun didn’t work:

torchrun --nnodes 2 ...my_script.py

seems to deadlock despite my pytorch libs being up to date. Do you know what might be happening there?

torch                   1.12.1+cu113
torchaudio              0.12.1+cu113
torchtext               0.13.1
torchvision             0.13.1+cu113

I did:

pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113
pip install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu113 --upgrade