SIGSEGV when training on multiple GPUs

I am trying to perform multi-GPU training using accelerate, but I get a SIGSEGV on my second GPU.
More specifically: I am able to run training normally when configuring accelerate to use only a single GPU; however, if I attempt to use more than one, when running accelerate launch, I obtain the following:

Instantiating trainer...
Instantiating trainer...
Starting training.
Starting training.
0:	loss: 6.401798248291016	lr: 0
0: valid loss 5.869513988494873
WARNING:torch.distributed.elastic.multiprocessing.api:Sending process 3024297 closing signal SIGTERM
ERROR:torch.distributed.elastic.multiprocessing.api:failed (exitcode: -11) local_rank: 1 (pid: 3024298) of binary: /medias/tools/miniconda/envs/audiolm/bin/python
Traceback (most recent call last):
  File "/medias/tools/miniconda/envs/audiolm/bin/accelerate", line 8, in <module>
  File "/medias/tools/miniconda/envs/audiolm/lib/python3.9/site-packages/accelerate/commands/", line 45, in main
  File "/medias/tools/miniconda/envs/audiolm/lib/python3.9/site-packages/accelerate/commands/", line 970, in launch_command
  File "/medias/tools/miniconda/envs/audiolm/lib/python3.9/site-packages/accelerate/commands/", line 646, in multi_gpu_launcher
  File "/medias/tools/miniconda/envs/audiolm/lib/python3.9/site-packages/torch/distributed/", line 785, in run
  File "/medias/tools/miniconda/envs/audiolm/lib/python3.9/site-packages/torch/distributed/launcher/", line 134, in __call__
    return launch_agent(self._config, self._entrypoint, list(args))
  File "/medias/tools/miniconda/envs/audiolm/lib/python3.9/site-packages/torch/distributed/launcher/", line 250, in launch_agent
    raise ChildFailedError(
========================================================= FAILED
Root Cause (first observed failure):
  time      : 2023-08-01_20:58:02
  host      :
  rank      : 1 (local_rank: 1)
  exitcode  : -11 (pid: 3024298)
  error_file: <N/A>
  traceback : Signal 11 (SIGSEGV) received by PID 3024298

My configuration file looks like this:

compute_environment: LOCAL_MACHINE
distributed_type: MULTI_GPU
downcast_bf16: 'no'
gpu_ids: '0,1'
machine_rank: 0
main_training_function: main
mixed_precision: 'no'
num_machines: 1
num_processes: 2
rdzv_backend: static
same_network: true
tpu_env: []
tpu_use_cluster: false
tpu_use_sudo: false
use_cpu: false

I am using accelerate 0.21.0 and torch 2.0.1+cu117. The CUDA version currently installed on my system, according to nvidia-smi, is 12.0.

Does anyone have any idea about what could cause this segmentation fault? It’s probably not due to an out-of-memory error, since training on a single GPU works fine with the same hyperparameters.