Segmentation fault (core dumped)

Hello,

I am encountering a segmentation fault issue while using the Transformers library on my Nvidia Jetson Xavier NX device. The issue occurs when I attempt to encode text with the “paraphrase-mpnet-base-v2” model.

Here are some details about my setup:

Hardware: Nvidia Jetson Xavier NX (15GB GPU, 8GB RAM, Arch)
Software: Numpy version 1.26.0, Sentence Transformer version 2.2.2
The error message I am receiving is as follows:

> /home/nvidia/Documents/alpaca-python/faisstest.py(10)
-> model_output = model(**encoded_input)
Segmentation fault (core dumped)

Could you please provide guidance on how to resolve this issue? Any insights, tips, or information on potential solutions would be greatly appreciated.

Thank you for your assistance.

Probably it’s because of OOM. Try reducing the batch size.

1 Like

TL;DR: It’s most probably not related to Python or any Python package. The root of the problem may go deeper, like to the OS or even the hardware.

I’ve been occasionally getting segmentation fault errors when running transformers or scikit-learn scripts for over two years now and I’ve never been able to get to the bottom of it. From what I’ve been able to find online, getting segmentation faults from Python scripts is strange and has little to do with the Python-side of things. It’s sometimes CUDA, sometimes the operating system or even the C-based infrastructure that Python uses. I’ve never bothered to update or reinstall the OS (Ubuntu 18.04 in my case) and segmentation fault from scikit-learn scripts tells me that the cause of the problem is probably not CUDA.

One consistent behavior in my system (and this would most probably be system-dependent) is when I run training or inference of a transformer model in CPU, either segmentation fault happens or the PC freezes and crashes. So, a rule of thumb for me is to run the model always on GPUs and CUDA. Maybe you can also try:

model.to("cuda")
model_output = model(**encoded_input.to("cuda"))

if you have GPU available.

Segmentation fault also occasionally happens in my system even if I run the code on GPU but it doesn’t consistently occur. If this doesn’t work, I guess your best bet is to reinstall/update parts of the infrastructure. Since it is the CPU that fails in my case, I suspect it might even be a hardware-related defect.

2 Likes

Hello @ehalit @Sandy1857 ,

Thank you for your previous suggestions. I tried both running the model on GPU and reduced batch_size, but unfortunately, the segmentation fault issue still persists.

Here are the debug logs I collected during program execution:

Starting program: /home/nvidia/Documents/alpaca-python/fais/bin/python3 faiss_run.py
[Thread debugging using libthread_db enabled]
Using host libthread_db library "/lib/aarch64-linux-gnu/libthread_db.so.1".
[Detaching after fork from child process 4625]
[New Thread 0xffffebbfb1e0 (LWP 4626)]
[New Thread 0xffffeb3fa1e0 (LWP 4627)]
[New Thread 0xffffe6bf91e0 (LWP 4628)]
[New Thread 0xffffe63f81e0 (LWP 4629)]
[New Thread 0xffffe3bf71e0 (LWP 4630)]
[New Thread 0xffffdd6fa1e0 (LWP 4631)]
[New Thread 0xffffdcef91e0 (LWP 4632)]
[New Thread 0xffffd86f81e0 (LWP 4633)]
[New Thread 0xffffd5ef71e0 (LWP 4634)]
[New Thread 0xffffd56f61e0 (LWP 4635)]
[New Thread 0xffff945ea1e0 (LWP 4638)]
[New Thread 0xffff93de91e0 (LWP 4639)]
[New Thread 0xffff935e81e0 (LWP 4640)]
[New Thread 0xffff92de71e0 (LWP 4641)]
[New Thread 0xffff925e61e0 (LWP 4642)]
faiss_model_load
encode
[New Thread 0xffffc879c1e0 (LWP 4643)]
[New Thread 0xffffc859b1e0 (LWP 4644)]
[New Thread 0xffffc839a1e0 (LWP 4645)]
[New Thread 0xffffc3fff1e0 (LWP 4646)]
[New Thread 0xffffc3dfe1e0 (LWP 4647)]
[New Thread 0xffffc3bfd1e0 (LWP 4648)]
done
e
--Type <RET> for more, q to quit, c to continue without paging--

Thread 1 "python3" received signal SIGSEGV, Segmentation fault.
0x0000fffff7f3ffb0 in __aarch64_cas4_acq ()
   from /lib/aarch64-linux-gnu/libc.so.6

As it stands, I’m still seeking a solution to this issue. Any further insights or suggestions from the community would be greatly appreciated.

Thank you for your continued support.

Best regards.

@Kalvee Could you please share the code and training parameters?

Hello @Sandy1857,

Here is the code :slight_smile:

from transformers import AutoTokenizer, AutoModel
import torch


#Mean Pooling - Take attention mask into account for correct averaging
def mean_pooling(model_output, attention_mask):
    token_embeddings = model_output[0] #First element of model_output contains all token embeddings
    input_mask_expanded = attention_mask.unsqueeze(-1).expand(token_embeddings.size()).float()
    return torch.sum(token_embeddings * input_mask_expanded, 1) / torch.clamp(input_mask_expanded.sum(1), min=1e-9)


# Sentences we want sentence embeddings for
sentences = ['This is an example sentence', 'Each sentence is converted']

# Load model from HuggingFace Hub
tokenizer = AutoTokenizer.from_pretrained('sentence-transformers/paraphrase-mpnet-base-v2')
model = AutoModel.from_pretrained('sentence-transformers/paraphrase-mpnet-base-v2')

# Tokenize sentences
encoded_input = tokenizer(sentences, padding=True, truncation=True, return_tensors='pt')

# Compute token embeddings
with torch.no_grad():
    model_output = model(**encoded_input)

# Perform pooling. In this case, max pooling.
sentence_embeddings = mean_pooling(model_output, encoded_input['attention_mask'])

print("Sentence embeddings:")
print(sentence_embeddings)

@Kalvee What’s your pytorch version? Try updating pytorch and running again.

Hi @Sandy1857 ,

Thanks for your quick response and time spent on this issue. Here is the installed pytorch version below.

(genai) nvidia@ubuntu:~/Documents/XavierGenAI/Miscellaneous$ pip show torch
Name: torch
Version: 2.0.1
Summary: Tensors and Dynamic neural networks in Python with strong GPU acceleration
Home-page: https://pytorch.org/
Author: PyTorch Team
Author-email: packages@pytorch.org
License: BSD-3
Location: /home/nvidia/.local/lib/python3.8/site-packages
Requires: typing-extensions, filelock, jinja2, sympy, networkx
Required-by: peft, accelerate```

By the way, since I’m suspecting the problem is in the CPU, I figured running the script with taskset as

taskset --cpu 0 my_app.py

limits CPU usage to a single thread and lowers the risk of segmentation fault.

Hi @ehalit ,

I tried this command still issue persist…! (segmentation Fault (core Dumped))

Not sure if this will help as I have an AMD 6700XT card not Nvidia … but I got this error but resolved it via running the following…

export HSA_OVERRIDE_GFX_VERSION=10.3.0
sudo apt install libstdc++-12-dev

I also encountered the similar error and I found that it’s because of the incompatibility between transformers and conda environment, so I changed to a local Python environment and it works. Hope that would help!

2 Likes

I have managed to reproduce the segmentation-fault on a
Macbook air M1 with the current pytorch 2.4 and python 3.12
The segmentation-fault also occurs with python 3.9.
Always when loading the SentenceTransformer.
This also works if mps is avoided, i.e. device = “cpu”.
The fact that the error does not occur may not mean that the bug
does not exist, but that the segment boundary is somewhere else.

The debugger showed this hierarchy

transformers utils modeling_utils
load
module.load_from_state_dict(*args) ← torch
module.py
param.copy
(input_param) ← segmentation fault

This is probably not directly related to your issue but it’s similar and may help some people.

For me it was a memory issue, more specifically I was trying to allocate more memory to wsl than was available on my system.

If you’re using windows, you can go to your user’s root directory and find or create a “.wslconfig” file and set the amount of ram you want to allocate – my system has 32gb and I was trying to allocate 16gb which worked fine until it didn’t.

The directory for this can be accessed a number of ways, the most straight forward is running %UserProfile% in the windows menu.
.wslconfig:

[wsl2]
memory=16GB 
processors=2

Conclusion: I just restarted my computer to free up memory, though the alternative would be to reduce the memory allocation.