Does the HF Trainer class support multi-node training?

@sgugger I tried the distributed training per this HF blog and it failed with multi-node training. It gave an mpirun error when i tried to use multiple instances (similar to when i tried the AWS blog). In both cases i tried with ml.p3.16xlarge and ml.p3dn.24xlarge instances.

Edit: I tried as written and with all variations of current PyTorch/HF URI containers (can’t link due to new account, limited to 2)

Any ideas on how to fix?