why the training args n_gpu is set to 1 when use Trainer’s distributed training？
which means only one device is allowed in a node？
and the training parameters printed is calculated with the n_gpus=1.
I want to know what should i do when every node has multi gpus.
The script is https://github.com/huggingface/transformers/blob/main/examples/pytorch/language-modeling/run_mlm.py
Here is the training parameters printed
Actually， I have two nodes both have 8 gpus，so the total batch size is 32 * 8 * 2 = 512 instead of 32 * 2 = 64，which is calculated with the num_gpus=1 in a node.