Accelerate.prepare hang on single machine multiple gpu

When I run:

import torch.nn as nn
from accelerate import Accelerator

if __name__ == "__main__":
    accelerator = Accelerator()
    model = nn.Conv2d(10, 20, 3, 1, 1)
    print("prepare")
    model = accelerator.prepare(model)
    print("done")

with config:

compute_environment: LOCAL_MACHINE

distributed_type: MULTI_GPU

downcast_bf16: 'no'

gpu_ids: all

machine_rank: 0

main_training_function: main

mixed_precision: 'no'

num_machines: 1

num_processes: 3

rdzv_backend: static

same_network: true

tpu_env: []

tpu_use_cluster: false

tpu_use_sudo: false

use_cpu: false

“done” never gets printed.
“done” gets printed if I set num_process to 1

But on some other server using the same config, “done” gets printed

Debug message for the one that’s not working:

prepare                                                                                                                     
prepare                                                                                                                     
prepare
osprey2:1465885:1465885 [0] NCCL INFO Bootstrap : Using eno1:128.174.136.28<0>
osprey2:1465885:1465885 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
osprey2:1465885:1465885 [0] NCCL INFO cudaDriverVersion 11070
NCCL version 2.14.3+cuda11.7
osprey2:1465885:1466929 [0] NCCL INFO Failed to open libibverbs.so[.1]
osprey2:1465885:1466929 [0] NCCL INFO NET/Socket : Using [0]eno1:128.174.136.28<0>
osprey2:1465885:1466929 [0] NCCL INFO Using network Socket
osprey2:1465891:1465891 [2] NCCL INFO cudaDriverVersion 11070
osprey2:1465889:1465889 [1] NCCL INFO cudaDriverVersion 11070
osprey2:1465889:1465889 [1] NCCL INFO Bootstrap : Using eno1:128.174.136.28<0>
osprey2:1465889:1465889 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
osprey2:1465889:1466930 [1] NCCL INFO Failed to open libibverbs.so[.1]
osprey2:1465889:1466930 [1] NCCL INFO NET/Socket : Using [0]eno1:128.174.136.28<0>
osprey2:1465889:1466930 [1] NCCL INFO Using network Socket
osprey2:1465891:1465891 [2] NCCL INFO Bootstrap : Using eno1:128.174.136.28<0>
osprey2:1465891:1465891 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation
osprey2:1465891:1466931 [2] NCCL INFO Failed to open libibverbs.so[.1]
osprey2:1465891:1466931 [2] NCCL INFO NET/Socket : Using [0]eno1:128.174.136.28<0>
osprey2:1465891:1466931 [2] NCCL INFO Using network Socket
osprey2:1465889:1466930 [1] NCCL INFO Setting affinity for GPU 1 to ffff0000,ffff0000
osprey2:1465891:1466931 [2] NCCL INFO Setting affinity for GPU 2 to ffff0000,ffff0000
osprey2:1465885:1466929 [0] NCCL INFO Setting affinity for GPU 0 to ffff,0000ffff
osprey2:1465891:1466931 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1 [2] -1/-1/-1->2->1 [3] -1/-1/-1->2->1
osprey2:1465889:1466930 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0 [2] 2/-1/-1->1->0 [3] 2/-1/-1->1->0
osprey2:1465885:1466929 [0] NCCL INFO Channel 00/04 :    0   1   2
osprey2:1465885:1466929 [0] NCCL INFO Channel 01/04 :    0   1   2
osprey2:1465885:1466929 [0] NCCL INFO Channel 02/04 :    0   1   2
osprey2:1465885:1466929 [0] NCCL INFO Channel 03/04 :    0   1   2
osprey2:1465885:1466929 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1 [2] 1/-1/-1->0->-1 [3] 1/-1/-1->0->-1
osprey2:1465885:1466929 [0] NCCL INFO Channel 00/0 : 0[21000] -> 1[81000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 00/0 : 1[81000] -> 2[e2000] via P2P/IPC
osprey2:1465891:1466931 [2] NCCL INFO Channel 00/0 : 2[e2000] -> 0[21000] via P2P/IPC
osprey2:1465885:1466929 [0] NCCL INFO Channel 01/0 : 0[21000] -> 1[81000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 01/0 : 1[81000] -> 2[e2000] via P2P/IPC
osprey2:1465891:1466931 [2] NCCL INFO Channel 01/0 : 2[e2000] -> 0[21000] via P2P/IPC
osprey2:1465885:1466929 [0] NCCL INFO Channel 02/0 : 0[21000] -> 1[81000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 02/0 : 1[81000] -> 2[e2000] via P2P/IPC
osprey2:1465891:1466931 [2] NCCL INFO Channel 02/0 : 2[e2000] -> 0[21000] via P2P/IPC
osprey2:1465885:1466929 [0] NCCL INFO Channel 03/0 : 0[21000] -> 1[81000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 03/0 : 1[81000] -> 2[e2000] via P2P/IPC
osprey2:1465891:1466931 [2] NCCL INFO Channel 03/0 : 2[e2000] -> 0[21000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Connected all rings
osprey2:1465891:1466931 [2] NCCL INFO Connected all rings
osprey2:1465891:1466931 [2] NCCL INFO Channel 00/0 : 2[e2000] -> 1[81000] via P2P/IPC
osprey2:1465885:1466929 [0] NCCL INFO Connected all rings
osprey2:1465891:1466931 [2] NCCL INFO Channel 01/0 : 2[e2000] -> 1[81000] via P2P/IPC
osprey2:1465891:1466931 [2] NCCL INFO Channel 02/0 : 2[e2000] -> 1[81000] via P2P/IPC
osprey2:1465891:1466931 [2] NCCL INFO Channel 03/0 : 2[e2000] -> 1[81000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 00/0 : 1[81000] -> 0[21000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 01/0 : 1[81000] -> 0[21000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 02/0 : 1[81000] -> 0[21000] via P2P/IPC
osprey2:1465889:1466930 [1] NCCL INFO Channel 03/0 : 1[81000] -> 0[21000] via P2P/IPC
osprey2:1465891:1466931 [2] NCCL INFO Connected all trees
osprey2:1465891:1466931 [2] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512
osprey2:1465891:1466931 [2] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer
osprey2:1465889:1466930 [1] NCCL INFO Connected all trees
osprey2:1465889:1466930 [1] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512
osprey2:1465889:1466930 [1] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer
osprey2:1465885:1466929 [0] NCCL INFO Connected all trees
osprey2:1465885:1466929 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512
osprey2:1465885:1466929 [0] NCCL INFO 4 coll channels, 4 p2p channels, 2 p2p channels per peer
osprey2:1465885:1466929 [0] NCCL INFO comm 0x3d1af060 rank 0 nranks 3 cudaDev 0 busId 21000 - Init COMPLETE
osprey2:1465891:1466931 [2] NCCL INFO comm 0x3de99e20 rank 2 nranks 3 cudaDev 2 busId e2000 - Init COMPLETE
osprey2:1465889:1466930 [1] NCCL INFO comm 0x3bd29b10 rank 1 nranks 3 cudaDev 1 busId 81000 - Init COMPLETE

debug message for the one working:

prepare

prepare

prepare

owl:1514262:1514262 [0] NCCL INFO Bootstrap : Using eno1np0:172.22.224.10<0>

owl:1514262:1514262 [0] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation

owl:1514262:1514262 [0] NCCL INFO cudaDriverVersion 12000

NCCL version 2.14.3+cuda11.7

owl:1514262:1514308 [0] NCCL INFO Failed to open libibverbs.so[.1]

owl:1514262:1514308 [0] NCCL INFO NET/Socket : Using [0]eno1np0:172.22.224.10<0>

owl:1514262:1514308 [0] NCCL INFO Using network Socket

owl:1514264:1514264 [2] NCCL INFO cudaDriverVersion 12000

owl:1514263:1514263 [1] NCCL INFO cudaDriverVersion 12000

owl:1514264:1514264 [2] NCCL INFO Bootstrap : Using eno1np0:172.22.224.10<0>

owl:1514264:1514264 [2] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation

owl:1514264:1514309 [2] NCCL INFO Failed to open libibverbs.so[.1]

owl:1514264:1514309 [2] NCCL INFO NET/Socket : Using [0]eno1np0:172.22.224.10<0>

owl:1514264:1514309 [2] NCCL INFO Using network Socket

owl:1514263:1514263 [1] NCCL INFO Bootstrap : Using eno1np0:172.22.224.10<0>

owl:1514263:1514263 [1] NCCL INFO NET/Plugin : No plugin found (libnccl-net.so), using internal implementation

owl:1514263:1514310 [1] NCCL INFO Failed to open libibverbs.so[.1]

owl:1514263:1514310 [1] NCCL INFO NET/Socket : Using [0]eno1np0:172.22.224.10<0>

owl:1514263:1514310 [1] NCCL INFO Using network Socket

owl:1514263:1514310 [1] NCCL INFO Setting affinity for GPU 1 to aa,aaaaaaaa

owl:1514262:1514308 [0] NCCL INFO Setting affinity for GPU 0 to 55,55555555

owl:1514264:1514309 [2] NCCL INFO Setting affinity for GPU 2 to aa,aaaaaaaa

owl:1514263:1514310 [1] NCCL INFO Trees [0] 2/-1/-1->1->0 [1] 2/-1/-1->1->0

owl:1514262:1514308 [0] NCCL INFO Channel 00/02 : 0 1 2

owl:1514262:1514308 [0] NCCL INFO Channel 01/02 : 0 1 2

owl:1514262:1514308 [0] NCCL INFO Trees [0] 1/-1/-1->0->-1 [1] 1/-1/-1->0->-1

owl:1514264:1514309 [2] NCCL INFO Trees [0] -1/-1/-1->2->1 [1] -1/-1/-1->2->1

owl:1514263:1514310 [1] NCCL INFO Channel 00 : 1[af000] -> 2[d8000] via SHM/direct/direct

owl:1514263:1514310 [1] NCCL INFO Channel 01 : 1[af000] -> 2[d8000] via SHM/direct/direct

owl:1514262:1514308 [0] NCCL INFO Channel 00 : 0[3b000] -> 1[af000] via SHM/direct/direct

owl:1514262:1514308 [0] NCCL INFO Channel 01 : 0[3b000] -> 1[af000] via SHM/direct/direct

owl:1514264:1514309 [2] NCCL INFO Channel 00 : 2[d8000] -> 0[3b000] via SHM/direct/direct

owl:1514264:1514309 [2] NCCL INFO Channel 01 : 2[d8000] -> 0[3b000] via SHM/direct/direct

owl:1514263:1514310 [1] NCCL INFO Connected all rings

owl:1514264:1514309 [2] NCCL INFO Connected all rings

owl:1514262:1514308 [0] NCCL INFO Connected all rings

owl:1514264:1514309 [2] NCCL INFO Channel 00 : 2[d8000] -> 1[af000] via SHM/direct/direct

owl:1514264:1514309 [2] NCCL INFO Channel 01 : 2[d8000] -> 1[af000] via SHM/direct/direct

owl:1514263:1514310 [1] NCCL INFO Channel 00 : 1[af000] -> 0[3b000] via SHM/direct/direct

owl:1514263:1514310 [1] NCCL INFO Channel 01 : 1[af000] -> 0[3b000] via SHM/direct/direct

owl:1514262:1514308 [0] NCCL INFO Connected all trees

owl:1514262:1514308 [0] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512

owl:1514262:1514308 [0] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer

owl:1514263:1514310 [1] NCCL INFO Connected all trees

owl:1514263:1514310 [1] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512

owl:1514263:1514310 [1] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer

owl:1514264:1514309 [2] NCCL INFO Connected all trees

owl:1514264:1514309 [2] NCCL INFO threadThresholds 8/8/64 | 24/8/64 | 512 | 512

owl:1514264:1514309 [2] NCCL INFO 2 coll channels, 2 p2p channels, 2 p2p channels per peer

owl:1514263:1514310 [1] NCCL INFO comm 0x3a270010 rank 1 nranks 3 cudaDev 1 busId af000 - Init COMPLETE

owl:1514264:1514309 [2] NCCL INFO comm 0x3d6b6c90 rank 2 nranks 3 cudaDev 2 busId d8000 - Init COMPLETE

owl:1514262:1514308 [0] NCCL INFO comm 0x3daa0080 rank 0 nranks 3 cudaDev 0 busId 3b000 - Init COMPLETE

done

done

done

owl:1514262:1514312 [0] NCCL INFO [Service thread] Connection closed by localRank 0

owl:1514262:1514262 [0] NCCL INFO comm 0x3daa0080 rank 0 nranks 3 cudaDev 0 busId 3b000 - Abort COMPLETE

owl:1514264:1514313 [2] NCCL INFO [Service thread] Connection closed by localRank 2

owl:1514264:1514264 [2] NCCL INFO comm 0x3d6b6c90 rank 2 nranks 3 cudaDev 2 busId d8000 - Abort COMPLETE

owl:1514263:1514311 [1] NCCL INFO [Service thread] Connection closed by localRank 1

owl:1514263:1514263 [1] NCCL INFO comm 0x3a270010 rank 1 nranks 3 cudaDev 1 busId af000 - Abort COMPLETE

I’ll need a bit more info about the machine. What GPUs? Also if you could use back-ticks, it would help a ton for readability of your code. I just tried mimicking the same on a machine with 4 GPUs (but only use 3):

{
  "compute_environment": "LOCAL_MACHINE",
  "distributed_type": "MULTI_GPU",
  "downcast_bf16": false,
  "machine_rank": 0,
  "main_training_function": "main",
  "mixed_precision": "no",
  "num_machines": 1,
  "num_processes": 3,
  "rdzv_backend": "static",
  "same_network": false,
  "tpu_use_cluster": false,
  "tpu_use_sudo": false,
  "use_cpu": false
}

Script:

import torch.nn as nn
from accelerate import Accelerator

if __name__ == "__main__":
    accelerator = Accelerator()
    model = nn.Conv2d(10, 20, 3, 1, 1)
    print("prepare")
    model = accelerator.prepare(model)
    print("done")

What is your version of Accelerate and PyTorch as well? Thanks!

num_process = 2 won’t work
On the server that’s not working(only works if I set num_process = 1):

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 515.86.01    Driver Version: 515.86.01    CUDA Version: 11.7     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA A40          Off  | 00000000:21:00.0 Off |                    0 |
|ERR!   31C    P0   257W / 300W |   1648MiB / 46068MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA A40          Off  | 00000000:81:00.0 Off |                    0 |
|  0%   29C    P0    78W / 300W |   1664MiB / 46068MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA A40          Off  | 00000000:E2:00.0 Off |                    0 |
|  0%   32C    P0    77W / 300W |   3789MiB / 46068MiB |    100%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

torch is 2.0.1, accelerate version 0.20.3

The one that’s working:
GPU is RTX 2080

+-----------------------------------------------------------------------------+
| NVIDIA-SMI 525.60.13    Driver Version: 525.60.13    CUDA Version: 12.0     |
|-------------------------------+----------------------+----------------------+
| GPU  Name        Persistence-M| Bus-Id        Disp.A | Volatile Uncorr. ECC |
| Fan  Temp  Perf  Pwr:Usage/Cap|         Memory-Usage | GPU-Util  Compute M. |
|                               |                      |               MIG M. |
|===============================+======================+======================|
|   0  NVIDIA GeForce ...  Off  | 00000000:3B:00.0 Off |                  N/A |
| 22%   21C    P8     7W / 215W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   1  NVIDIA GeForce ...  Off  | 00000000:AF:00.0 Off |                  N/A |
| 22%   21C    P8    16W / 215W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+
|   2  NVIDIA GeForce ...  Off  | 00000000:D8:00.0 Off |                  N/A |
| 23%   21C    P8    19W / 215W |      0MiB /  8192MiB |      0%      Default |
|                               |                      |                  N/A |
+-------------------------------+----------------------+----------------------+

torch is 2.0.1, accelerate version 0.20.3

Hi, any updates?