Distribute training

why the training args n_gpu is set to 1 when use Trainer’s distributed training?
which means only one device is allowed in a node?
and the training parameters printed is calculated with the n_gpus=1.
I want to know what should i do when every node has multi gpus.

The script is https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py
Here is the training parameters printed


Actually, I have two nodes both have 8 gpus,so the total batch size is 32 * 8 * 2 = 512 instead of 32 * 2 = 64,which is calculated with the num_gpus=1 in a node.