How to check whether the communication between multi-nodes is working well?

I have implemented a distributed training setup with 2 nodes and 8 GPUs following the guides available here, and it was functioning properly. However, when I increased the number of nodes to 3, I noticed that the GPU usage became very low, and it seemed like the training was only utilizing the CPU. I suspect that there might be an issue with the communication between the nodes. Could someone please guide me on how to check if the communication between the multiple nodes is working correctly?

Here are configs for accelerate:

# host
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 0
main_process_ip: 10.237.38.151
main_process_port: 12150
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 3
num_processes: 12
rdzv_backend: static
same_network: true
use_cpu: false
# client 0
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 1
main_process_ip: 10.237.38.151
main_process_port: 12150
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 3
num_processes: 12
rdzv_backend: static
same_network: true
use_cpu: false
# client 1
compute_environment: LOCAL_MACHINE
deepspeed_config: {}
distributed_type: MULTI_GPU
downcast_bf16: 'no'
dynamo_backend: 'NO'
fsdp_config: {}
gpu_ids: all
machine_rank: 2
main_process_ip: 10.237.38.151
main_process_port: 12150
main_training_function: main
megatron_lm_config: {}
mixed_precision: 'no'
num_machines: 3
num_processes: 12
rdzv_backend: static
same_network: true
use_cpu: false

Can all three nodes ping eachother?