I too am seeing the SIGSEV error when I run the Simple NLP example on an A100 pod:
---------------------------------------------------------------------------
ProcessExitedException Traceback (most recent call last)
Input In [25], in <cell line: 3>()
1 from accelerate import notebook_launcher
----> 3 notebook_launcher(training_function, num_processes=4)
File ~/envs/wav2vec/lib/python3.9/site-packages/accelerate/launchers.py:129, in notebook_launcher(function, args, num_processes, use_fp16, mixed_precision, use_port)
126 launcher = PrepareForLaunch(function, distributed_type="MULTI_GPU")
128 print(f"Launching training on {num_processes} GPUs.")
--> 129 start_processes(launcher, args=args, nprocs=num_processes, start_method="fork")
131 else:
132 # No need for a distributed launch otherwise as it's either CPU or one GPU.
133 if torch.cuda.is_available():
File ~/envs/wav2vec/lib/python3.9/site-packages/torch/multiprocessing/spawn.py:198, in start_processes(fn, args, nprocs, join, daemon, start_method)
195 return context
197 # Loop on join until it returns True or raises an exception.
--> 198 while not context.join():
199 pass
File ~/envs/wav2vec/lib/python3.9/site-packages/torch/multiprocessing/spawn.py:140, in ProcessContext.join(self, timeout)
138 if exitcode < 0:
139 name = signal.Signals(-exitcode).name
--> 140 raise ProcessExitedException(
141 "process %d terminated with signal %s" %
142 (error_index, name),
143 error_index=error_index,
144 error_pid=failed_process.pid,
145 exit_code=exitcode,
146 signal_name=name
147 )
148 else:
149 raise ProcessExitedException(
150 "process %d terminated with exit code %d" %
151 (error_index, exitcode),
(...)
154 exit_code=exitcode
155 )
ProcessExitedException: process 0 terminated with signal SIGSEGV
That error was also preceded by a bunch of warnings BTW:
Launching training on 4 GPUs.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
- Avoid using `tokenizers` before the fork if possible
- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)