Multi-GPU Training sometimes working with 2GPU, but never more than 2

Hey everybody,
for my masters thesis Iโ€™m currently trying to run class conditional diffusion on microscopy images.
For this I need images with a resolution of 512x512, so Iโ€™m relying on a compute cluster provided by my university. Training on 1 GPU results in an epoch time of 32-45min, which is not at all doable for me. But I canโ€™t seem to get Multi-GPU working correctly. Following are my specs:

- `Accelerate` version: 0.21.0.dev0
- Platform: Linux-3.10.0-1160.83.1.el7.x86_64-x86_64-with-glibc2.17
- Python version: 3.8.16
- Numpy version: 1.24.3
- PyTorch version (GPU?): 2.0.1+cu117 (True)
- PyTorch XPU available: False
- System RAM: 355.40 GB
- GPU type: Quadro P6000
- `Accelerate` default config:
        - compute_environment: LOCAL_MACHINE
        - distributed_type: MULTI_GPU
        - mixed_precision: fp16
        - use_cpu: False
        - num_processes: 4
        - machine_rank: 0
        - num_machines: 1
        - gpu_ids: 0,1,2,3
        - rdzv_backend: static
        - same_network: True
        - main_training_function: main
        - downcast_bf16: no
        - tpu_use_cluster: False
        - tpu_use_sudo: False
        - tpu_env: []

Docker version 23.0.1, build a5ee5b1
GPUS: 10x Quadro P6000 24GB`

And this is my error message, when I try to run `accelerate test`:
`
Running:  accelerate-launch /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/scripts/test_script.py
stdout: [2023-07-13 20:19:50,932] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: [2023-07-13 20:19:55,740] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: [2023-07-13 20:19:55,745] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: [2023-07-13 20:19:55,758] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: [2023-07-13 20:19:55,763] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 3
stdout: Local process index: 3
stdout: Device: cuda:3
stdout: 
stdout: Mixed precision type: fp16
stdout: 
stdout: **Initialization**
stdout: Testing, testing. 1, 2, 3.
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 0
stdout: Local process index: 0
stdout: Device: cuda:0
stdout: 
stdout: Mixed precision type: fp16
stdout: 
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 1
stdout: Local process index: 1
stdout: Device: cuda:1
stdout: 
stdout: Mixed precision type: fp16
stdout: 
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 2
stdout: Local process index: 2
stdout: Device: cuda:2
stdout: 
stdout: Mixed precision type: fp16
stdout: 
stdout: [20:20:03] ERROR    failed (exitcode: -7) local_rank: 0 (pid: 927) of binary: /opt/conda/envs/accelerate/bin/python3                                                             api.py:672
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /opt/conda/envs/accelerate/bin/accelerate:8 in <module>                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   5 from accelerate.commands.accelerate_cli import main                                          โ”‚
โ”‚   6 if __name__ == '__main__':                                                                   โ”‚
โ”‚   7 โ”‚   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?
```, '', sys.argv[0])                         โ”‚
โ”‚ โฑ 8 โ”‚   sys.exit(main())                                                                         โ”‚
โ”‚   9                                                                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py:45  โ”‚
โ”‚ in main                                                                                          โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   42 โ”‚   โ”‚   exit(1)                                                                             โ”‚
โ”‚   43 โ”‚                                                                                           โ”‚
โ”‚   44 โ”‚   # Run                                                                                   โ”‚
โ”‚ โฑ 45 โ”‚   args.func(args)                                                                         โ”‚
โ”‚   46                                                                                             โ”‚
โ”‚   47                                                                                             โ”‚
โ”‚   48 if __name__ == "__main__":                                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/test.py:54 in         โ”‚
โ”‚ test_command                                                                                     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   51 โ”‚   โ”‚   test_args = f"--config_file={args.config_file} {script_name}"                       โ”‚
โ”‚   52 โ”‚                                                                                           โ”‚
โ”‚   53 โ”‚   cmd = ["accelerate-launch"] + test_args.split()                                         โ”‚
โ”‚ โฑ 54 โ”‚   result = execute_subprocess_async(cmd, env=os.environ.copy())                           โ”‚
โ”‚   55 โ”‚   if result.returncode == 0:                                                              โ”‚
โ”‚   56 โ”‚   โ”‚   print("Test is a success! You are ready for your distributed training!")            โ”‚
โ”‚   57                                                                                             โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/testing.py:383 in   โ”‚
โ”‚ execute_subprocess_async                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   380 โ”‚   cmd_str = " ".join(cmd)                                                                โ”‚
โ”‚   381 โ”‚   if result.returncode > 0:                                                              โ”‚
โ”‚   382 โ”‚   โ”‚   stderr = "\n".join(result.stderr)                                                  โ”‚
โ”‚ โฑ 383 โ”‚   โ”‚   raise RuntimeError(                                                                โ”‚
โ”‚   384 โ”‚   โ”‚   โ”‚   f"'{cmd_str}' failed with returncode {result.returncode}\n\n"                  โ”‚
โ”‚   385 โ”‚   โ”‚   โ”‚   f"The combined stderr from workers follows:\n{stderr}"                         โ”‚
โ”‚   386 โ”‚   โ”‚   )                                                                                  โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
RuntimeError: 'accelerate-launch /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/scripts/test_script.py' failed with returncode 1

The combined stderr from workers follows:
โ•ญโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€ Traceback (most recent call last) โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฎ
โ”‚ /opt/conda/envs/accelerate/bin/accelerate-launch:8 in <module>                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   5 from accelerate.commands.launch import main                                                  โ”‚
โ”‚   6 if __name__ == '__main__':                                                                   โ”‚
โ”‚   7 โ”‚   sys.argv[0] = re.sub(r'(-script\.pyw|\.exe)?
```, '', sys.argv[0])                         โ”‚
โ”‚ โฑ 8 โ”‚   sys.exit(main())                                                                         โ”‚
โ”‚   9                                                                                              โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/launch.py:975 in main โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   972 def main():                                                                                โ”‚
โ”‚   973 โ”‚   parser = launch_command_parser()                                                       โ”‚
โ”‚   974 โ”‚   args = parser.parse_args()                                                             โ”‚
โ”‚ โฑ 975 โ”‚   launch_command(args)                                                                   โ”‚
โ”‚   976                                                                                            โ”‚
โ”‚   977                                                                                            โ”‚
โ”‚   978 if __name__ == "__main__":                                                                 โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/launch.py:960 in      โ”‚
โ”‚ launch_command                                                                                   โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   957 โ”‚   elif args.use_megatron_lm and not args.cpu:                                            โ”‚
โ”‚   958 โ”‚   โ”‚   multi_gpu_launcher(args)                                                           โ”‚
โ”‚   959 โ”‚   elif args.multi_gpu and not args.cpu:                                                  โ”‚
โ”‚ โฑ 960 โ”‚   โ”‚   multi_gpu_launcher(args)                                                           โ”‚
โ”‚   961 โ”‚   elif args.tpu and not args.cpu:                                                        โ”‚
โ”‚   962 โ”‚   โ”‚   if args.tpu_use_cluster:                                                           โ”‚
โ”‚   963 โ”‚   โ”‚   โ”‚   tpu_pod_launcher(args)                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/launch.py:649 in      โ”‚
โ”‚ multi_gpu_launcher                                                                               โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   646 โ”‚   )                                                                                      โ”‚
โ”‚   647 โ”‚   with patch_environment(**current_env):                                                 โ”‚
โ”‚   648 โ”‚   โ”‚   try:                                                                               โ”‚
โ”‚ โฑ 649 โ”‚   โ”‚   โ”‚   distrib_run.run(args)                                                          โ”‚
โ”‚   650 โ”‚   โ”‚   except Exception:                                                                  โ”‚
โ”‚   651 โ”‚   โ”‚   โ”‚   if is_rich_available() and debug:                                              โ”‚
โ”‚   652 โ”‚   โ”‚   โ”‚   โ”‚   console = get_console()                                                    โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/torch/distributed/run.py:785 in run       โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   782 โ”‚   โ”‚   )                                                                                  โ”‚
โ”‚   783 โ”‚                                                                                          โ”‚
โ”‚   784 โ”‚   config, cmd, cmd_args = config_from_args(args)                                         โ”‚
โ”‚ โฑ 785 โ”‚   elastic_launch(                                                                        โ”‚
โ”‚   786 โ”‚   โ”‚   config=config,                                                                     โ”‚
โ”‚   787 โ”‚   โ”‚   entrypoint=cmd,                                                                    โ”‚
โ”‚   788 โ”‚   )(*cmd_args)                                                                           โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/torch/distributed/launcher/api.py:134 in  โ”‚
โ”‚ __call__                                                                                         โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   131 โ”‚   โ”‚   self._entrypoint = entrypoint                                                      โ”‚
โ”‚   132 โ”‚                                                                                          โ”‚
โ”‚   133 โ”‚   def __call__(self, *args):                                                             โ”‚
โ”‚ โฑ 134 โ”‚   โ”‚   return launch_agent(self._config, self._entrypoint, list(args))                    โ”‚
โ”‚   135                                                                                            โ”‚
โ”‚   136                                                                                            โ”‚
โ”‚   137 def _get_entrypoint_name(                                                                  โ”‚
โ”‚                                                                                                  โ”‚
โ”‚ /opt/conda/envs/accelerate/lib/python3.8/site-packages/torch/distributed/launcher/api.py:250 in  โ”‚
โ”‚ launch_agent                                                                                     โ”‚
โ”‚                                                                                                  โ”‚
โ”‚   247 โ”‚   โ”‚   โ”‚   # if the error files for the failed children exist                             โ”‚
โ”‚   248 โ”‚   โ”‚   โ”‚   # @record will copy the first error (root cause)                               โ”‚
โ”‚   249 โ”‚   โ”‚   โ”‚   # to the error file of the launcher process.                                   โ”‚
โ”‚ โฑ 250 โ”‚   โ”‚   โ”‚   raise ChildFailedError(                                                        โ”‚
โ”‚   251 โ”‚   โ”‚   โ”‚   โ”‚   name=entrypoint_name,                                                      โ”‚
โ”‚   252 โ”‚   โ”‚   โ”‚   โ”‚   failures=result.failures,                                                  โ”‚
โ”‚   253 โ”‚   โ”‚   โ”‚   )                                                                              โ”‚
โ•ฐโ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ•ฏ
ChildFailedError:
============================================================
/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/scripts/test_script.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-07-13_20:20:03
  host      : d412dfc663fe
  rank      : 1 (local_rank: 1)
  exitcode  : -7 (pid: 928)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 928
[2]:
  time      : 2023-07-13_20:20:03
  host      : d412dfc663fe
  rank      : 2 (local_rank: 2)
  exitcode  : -7 (pid: 929)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 929
[3]:
  time      : 2023-07-13_20:20:03
  host      : d412dfc663fe
  rank      : 3 (local_rank: 3)
  exitcode  : -7 (pid: 930)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 930
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-13_20:20:03
  host      : d412dfc663fe
  rank      : 0 (local_rank: 0)
  exitcode  : -7 (pid: 927)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 927
============================================================
ERROR conda.cli.main_run:execute(47): `conda run accelerate test` failed. (See above for error)

Any help would be greatly appreciated!

Try pulling down the latest main, i believe this was fixed yesterday

Thank you for the tip, but sadly this didnโ€™t work. I received the same error, but with additional info:

Running:  accelerate-launch /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/scripts/test_script.py
stdout: 
stdout: ===================================BUG REPORT===================================
stdout: Welcome to bitsandbytes. For bug reports, please run
stdout: 
stdout: python -m bitsandbytes
stdout: 
stdout:  and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
stdout: ================================================================================
stdout: bin /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
stdout: CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
stdout: CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
stdout: CUDA SETUP: Highest compute capability among GPUs detected: 6.1
stdout: CUDA SETUP: Detected CUDA version 112
stdout: CUDA SETUP: Loading binary /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
stdout: [2023-07-14 12:27:40,205] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: 
stdout: ===================================BUG REPORT===================================
stdout: Welcome to bitsandbytes. For bug reports, please run
stdout: 
stdout: python -m bitsandbytes
stdout: 
stdout:  and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
stdout: ================================================================================
stdout: bin /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
stdout: CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
stdout: CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
stdout: CUDA SETUP: Highest compute capability among GPUs detected: 6.1
stdout: CUDA SETUP: Detected CUDA version 112
stdout: CUDA SETUP: Loading binary /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
stdout: 
stdout: ===================================BUG REPORT===================================
stdout: Welcome to bitsandbytes. For bug reports, please run
stdout: 
stdout: python -m bitsandbytes
stdout: 
stdout:  and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
stdout: ================================================================================
stdout: 
stdout: ===================================BUG REPORT===================================
stdout: Welcome to bitsandbytes. For bug reports, please run
stdout: 
stdout: python -m bitsandbytes
stdout: 
stdout:  and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
stdout: ================================================================================
stdout: bin /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
stdout: CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
stdout: CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so.11.0
stdout: CUDA SETUP: Highest compute capability among GPUs detected: 6.1
stdout: CUDA SETUP: Detected CUDA version 112
stdout: CUDA SETUP: Loading binary /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
stdout: bin /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
stdout: CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
stdout: CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
stdout: CUDA SETUP: Highest compute capability among GPUs detected: 6.1
stdout: CUDA SETUP: Detected CUDA version 112
stdout: CUDA SETUP: Loading binary /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
stdout: 
stdout: ===================================BUG REPORT===================================
stdout: Welcome to bitsandbytes. For bug reports, please run
stdout: 
stdout: python -m bitsandbytes
stdout: 
stdout:  and submit this information together with your error trace to: https://github.com/TimDettmers/bitsandbytes/issues
stdout: ================================================================================
stdout: bin /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so
stdout: CUDA_SETUP: WARNING! libcudart.so not found in any environmental path. Searching in backup paths...
stdout: CUDA SETUP: CUDA runtime path found: /usr/local/cuda/lib64/libcudart.so
stdout: CUDA SETUP: Highest compute capability among GPUs detected: 6.1
stdout: CUDA SETUP: Detected CUDA version 112
stdout: CUDA SETUP: Loading binary /opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/libbitsandbytes_cuda112_nocublaslt.so...
stdout: [2023-07-14 12:27:45,766] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: [2023-07-14 12:27:46,053] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: [2023-07-14 12:27:46,056] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: [2023-07-14 12:27:46,056] [INFO] [real_accelerator.py:110:get_accelerator] Setting ds_accelerator to cuda (auto detect)
stdout: **Initialization**
stdout: Testing, testing. 1, 2, 3.
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 0
stdout: Local process index: 0
stdout: Device: cuda:0
stdout: 
stdout: Mixed precision type: fp16
stdout: 
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 3
stdout: Local process index: 3
stdout: Device: cuda:3
stdout: 
stdout: Mixed precision type: fp16
stdout: 
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 2
stdout: Local process index: 2
stdout: Device: cuda:2
stdout: 
stdout: Mixed precision type: fp16
stdout: 
stdout: Distributed environment: MULTI_GPU  Backend: nccl
stdout: Num processes: 4
stdout: Process index: 1
stdout: Local process index: 1
stdout: Device: cuda:1
stdout: 
stdout: Mixed precision type: fp16
stdout: 
Traceback (most recent call last):
  File "/opt/conda/envs/accelerate/bin/accelerate", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/accelerate_cli.py", line 45, in main
    args.func(args)
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/test.py", line 54, in test_command
    result = execute_subprocess_async(cmd, env=os.environ.copy())
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/testing.py", line 391, in execute_subprocess_async
    raise RuntimeError(
RuntimeError: 'accelerate-launch /opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/scripts/test_script.py' failed with returncode 1

The combined stderr from workers follows:
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda/envs/accelerate did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_oarpjszp/none_ylpto1p4/attempt_0/2/error.json')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda/envs/accelerate did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_oarpjszp/none_ylpto1p4/attempt_0/3/error.json')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda/envs/accelerate did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_oarpjszp/none_ylpto1p4/attempt_0/0/error.json')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda/envs/accelerate did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib'), PosixPath('/usr/local/nvidia/lib64')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/tmp/torchelastic_oarpjszp/none_ylpto1p4/attempt_0/1/error.json')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so'), PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /opt/conda/envs/accelerate did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: The following directories listed in your path were found to be non-existent: {PosixPath('/usr/local/nvidia/lib64'), PosixPath('/usr/local/nvidia/lib')}
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: /usr/local/nvidia/lib:/usr/local/nvidia/lib64 did not contain ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] as expected! Searching further paths...
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: Found duplicate ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] files: {PosixPath('/usr/local/cuda/lib64/libcudart.so.11.0'), PosixPath('/usr/local/cuda/lib64/libcudart.so')}.. We'll flip a coin and try one of these, in order to fail forward.
Either way, this might cause trouble in the future:
If you get `CUDA error: invalid device function` errors, the above might be the cause and the solution is to make sure only one ['libcudart.so', 'libcudart.so.11.0', 'libcudart.so.12.0'] in the paths that we search based on your env.
  warn(msg)
/opt/conda/envs/accelerate/lib/python3.8/site-packages/bitsandbytes/cuda_setup/main.py:149: UserWarning: WARNING: Compute capability < 7.5 detected! Only slow 8-bit matmul is supported for your GPU!
  warn(msg)
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -7) local_rank: 0 (pid: 2218) of binary: /opt/conda/envs/accelerate/bin/python
Traceback (most recent call last):
  File "/opt/conda/envs/accelerate/bin/accelerate-launch", line 8, in <module>
    sys.exit(main())
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/launch.py", line 985, in main
    launch_command(args)
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/launch.py", line 970, in launch_command
    multi_gpu_launcher(args)
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/commands/launch.py", line 646, in multi_gpu_launcher
    distrib_run.run(args)
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/torch/distributed/run.py", line 785, in run
    elastic_launch(
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/opt/conda/envs/accelerate/lib/python3.8/site-packages/torch/distributed/launcher/api.py", line 250, in launch_agent
    raise ChildFailedError(
torch.distributed.elastic.multiprocessing.errors.ChildFailedError:
============================================================
/opt/conda/envs/accelerate/lib/python3.8/site-packages/accelerate/test_utils/scripts/test_script.py FAILED
------------------------------------------------------------
Failures:
[1]:
  time      : 2023-07-14_12:27:53
  host      : f950bbcdba9b
  rank      : 1 (local_rank: 1)
  exitcode  : -7 (pid: 2219)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 2219
[2]:
  time      : 2023-07-14_12:27:53
  host      : f950bbcdba9b
  rank      : 2 (local_rank: 2)
  exitcode  : -7 (pid: 2220)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 2220
[3]:
  time      : 2023-07-14_12:27:53
  host      : f950bbcdba9b
  rank      : 3 (local_rank: 3)
  exitcode  : -7 (pid: 2221)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 2221
------------------------------------------------------------
Root Cause (first observed failure):
[0]:
  time      : 2023-07-14_12:27:53
  host      : f950bbcdba9b
  rank      : 0 (local_rank: 0)
  exitcode  : -7 (pid: 2218)
  error_file: <N/A>
  traceback : Signal 7 (SIGBUS) received by PID 2218
============================================================
ERROR conda.cli.main_run:execute(47): `conda run accelerate test` failed. (See above for error)

What was the fix/bug? Can you please send a link to the commit/PR? It seems like there have been multiple changes during the last few days.

Any ideas or updates about this?

Your nvcc --version (CUDA) is 11.2 and your torch version is 11.7
They should be same .