Multi gpu training

It seems that the hugging face implementation still uses nn.DataParallel for one node multi-gpu training.
In the pytorch documentation page, it clearly states that " It is recommended to use DistributedDataParallel instead of DataParallel to do multi-GPU training, even if there is only a single node. Could you please clarify if my understanding is correct? and if your training support DistributedDataParallel for one node with multiple GPUs.

Both are supported by the Hugging Face Trainer. You just have to use the pytorch launcher to use DistributedDataParallel, see an example here.

how to find the example?

See here. E.g.

python -m torch.distributed.launch --nproc_per_node=NUM_GPUS_YOU_HAVE YOUR_TRAINING_SCRIPT.py (--arg1 --arg2 --arg3 and all other arguments of your training script)