Simple NLP Example not working

drscotthawley · June 2, 2022, 8:53pm

I too am seeing the SIGSEV error when I run the Simple NLP example on an A100 pod:

---------------------------------------------------------------------------
ProcessExitedException                    Traceback (most recent call last)
Input In [25], in <cell line: 3>()
      1 from accelerate import notebook_launcher
----> 3 notebook_launcher(training_function, num_processes=4)

File ~/envs/wav2vec/lib/python3.9/site-packages/accelerate/launchers.py:129, in notebook_launcher(function, args, num_processes, use_fp16, mixed_precision, use_port)
    126         launcher = PrepareForLaunch(function, distributed_type="MULTI_GPU")
    128         print(f"Launching training on {num_processes} GPUs.")
--> 129         start_processes(launcher, args=args, nprocs=num_processes, start_method="fork")
    131 else:
    132     # No need for a distributed launch otherwise as it's either CPU or one GPU.
    133     if torch.cuda.is_available():

File ~/envs/wav2vec/lib/python3.9/site-packages/torch/multiprocessing/spawn.py:198, in start_processes(fn, args, nprocs, join, daemon, start_method)
    195     return context
    197 # Loop on join until it returns True or raises an exception.
--> 198 while not context.join():
    199     pass

File ~/envs/wav2vec/lib/python3.9/site-packages/torch/multiprocessing/spawn.py:140, in ProcessContext.join(self, timeout)
    138 if exitcode < 0:
    139     name = signal.Signals(-exitcode).name
--> 140     raise ProcessExitedException(
    141         "process %d terminated with signal %s" %
    142         (error_index, name),
    143         error_index=error_index,
    144         error_pid=failed_process.pid,
    145         exit_code=exitcode,
    146         signal_name=name
    147     )
    148 else:
    149     raise ProcessExitedException(
    150         "process %d terminated with exit code %d" %
    151         (error_index, exitcode),
   (...)
    154         exit_code=exitcode
    155     )

ProcessExitedException: process 0 terminated with signal SIGSEGV

That error was also preceded by a bunch of warnings BTW:

Launching training on 4 GPUs.
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)
huggingface/tokenizers: The current process just got forked, after parallelism has already been used. Disabling parallelism to avoid deadlocks...
To disable this warning, you can either:
	- Avoid using `tokenizers` before the fork if possible
	- Explicitly set the environment variable TOKENIZERS_PARALLELISM=(true | false)

Topic		Replies	Views
TrainingArgument does not work on colab Beginners	20	13135	May 7, 2025
In Colab, the part I gave below gives errors in all codes. Please can you help? 🤗Transformers	4	355	December 29, 2023
Can't run quenstion-answer LLM finetuning demo in google colab Beginners	2	492	November 1, 2023
Error with gpt2 training 🤗Transformers	0	359	August 1, 2023
ImportError: Using the Trainer with PyTorch: Seq2SeqTrainingArguments 🤗Transformers	4	5239	June 17, 2023

Simple NLP Example not working

Related topics