I am encountering a segmentation fault issue while using the Transformers library on my Nvidia Jetson Xavier NX device. The issue occurs when I attempt to encode text with the “paraphrase-mpnet-base-v2” model.
Here are some details about my setup:
Hardware: Nvidia Jetson Xavier NX (15GB GPU, 8GB RAM, Arch)
Software: Numpy version 1.26.0, Sentence Transformer version 2.2.2
The error message I am receiving is as follows:
-> model_output = model(**encoded_input)
Segmentation fault (core dumped)
Could you please provide guidance on how to resolve this issue? Any insights, tips, or information on potential solutions would be greatly appreciated.
Thank you for your assistance.
Probably it’s because of OOM. Try reducing the batch size.
TL;DR: It’s most probably not related to Python or any Python package. The root of the problem may go deeper, like to the OS or even the hardware.
I’ve been occasionally getting segmentation fault errors when running transformers or scikit-learn scripts for over two years now and I’ve never been able to get to the bottom of it. From what I’ve been able to find online, getting segmentation faults from Python scripts is strange and has little to do with the Python-side of things. It’s sometimes CUDA, sometimes the operating system or even the C-based infrastructure that Python uses. I’ve never bothered to update or reinstall the OS (Ubuntu 18.04 in my case) and segmentation fault from scikit-learn scripts tells me that the cause of the problem is probably not CUDA.
One consistent behavior in my system (and this would most probably be system-dependent) is when I run training or inference of a transformer model in CPU, either segmentation fault happens or the PC freezes and crashes. So, a rule of thumb for me is to run the model always on GPUs and CUDA. Maybe you can also try:
model_output = model(**encoded_input.to("cuda"))
if you have GPU available.
Segmentation fault also occasionally happens in my system even if I run the code on GPU but it doesn’t consistently occur. If this doesn’t work, I guess your best bet is to reinstall/update parts of the infrastructure. Since it is the CPU that fails in my case, I suspect it might even be a hardware-related defect.
Hello @ehalit @Sandy1857 ,
Thank you for your previous suggestions. I tried both running the model on GPU and reduced batch_size, but unfortunately, the segmentation fault issue still persists.
Here are the debug logs I collected during program execution:
Starting program: /home/nvidia/Documents/alpaca-python/fais/bin/python3 faiss_run.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 4625]
[New Thread 0xffffebbfb1e0 (LWP 4626)]
[New Thread 0xffffeb3fa1e0 (LWP 4627)]
[New Thread 0xffffe6bf91e0 (LWP 4628)]
[New Thread 0xffffe63f81e0 (LWP 4629)]
[New Thread 0xffffe3bf71e0 (LWP 4630)]
[New Thread 0xffffdd6fa1e0 (LWP 4631)]
[New Thread 0xffffdcef91e0 (LWP 4632)]
[New Thread 0xffffd86f81e0 (LWP 4633)]
[New Thread 0xffffd5ef71e0 (LWP 4634)]
[New Thread 0xffffd56f61e0 (LWP 4635)]
[New Thread 0xffff945ea1e0 (LWP 4638)]
[New Thread 0xffff93de91e0 (LWP 4639)]
[New Thread 0xffff935e81e0 (LWP 4640)]
[New Thread 0xffff92de71e0 (LWP 4641)]
[New Thread 0xffff925e61e0 (LWP 4642)]
[New Thread 0xffffc879c1e0 (LWP 4643)]
[New Thread 0xffffc859b1e0 (LWP 4644)]
[New Thread 0xffffc839a1e0 (LWP 4645)]
[New Thread 0xffffc3fff1e0 (LWP 4646)]
[New Thread 0xffffc3dfe1e0 (LWP 4647)]
[New Thread 0xffffc3bfd1e0 (LWP 4648)]
--Type <RET> for more, q to quit, c to continue without paging--
Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000fffff7f3ffb0 in __aarch64_cas4_acq ()
As it stands, I’m still seeking a solution to this issue. Any further insights or suggestions from the community would be greatly appreciated.
Thank you for your continued support.
@Kalvee Could you please share the code and training parameters?
Here is the code
from transformers import AutoTokenizer, AutoModel
#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
token_embeddings = model_output #First element of model_output contains all token embeddings
input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)
# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']
# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-mpnet-base-v2')
model = AutoModel.from_pretrained('sentence-transformers/paraphrase-mpnet-base-v2')
# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')
# Compute token embeddings
model_output = model(**encoded_input)
# Perform pooling. In this case, max pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])
@Kalvee What’s your pytorch version? Try updating pytorch and running again.
Hi @Sandy1857 ,
Thanks for your quick response and time spent on this issue. Here is the installed pytorch version below.
(genai) nvidia@ubuntu:~/Documents/XavierGenAI/Miscellaneous$ pip show torch
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Author: PyTorch Team
Requires: typing-extensions, filelock, jinja2, sympy, networkx
Required-by: peft, accelerate```
By the way, since I’m suspecting the problem is in the CPU, I figured running the script with taskset as
taskset --cpu 0 my_app.py
limits CPU usage to a single thread and lowers the risk of segmentation fault.
Hi @ehalit ,
I tried this command still issue persist…! (segmentation Fault (core Dumped))