I am encountering a segmentation fault issue while using the Transformers library on my Nvidia Jetson Xavier NX device. The issue occurs when I attempt to encode text with the “paraphrase-mpnet-base-v2” model.
Here are some details about my setup:
Hardware: Nvidia Jetson Xavier NX (15GB GPU, 8GB RAM, Arch)
Software: Numpy version 1.26.0, Sentence Transformer version 2.2.2
The error message I am receiving is as follows:
Could you please provide guidance on how to resolve this issue? Any insights, tips, or information on potential solutions would be greatly appreciated.
TL;DR: It’s most probably not related to Python or any Python package. The root of the problem may go deeper, like to the OS or even the hardware.
I’ve been occasionally getting segmentation fault errors when running transformers or scikit-learn scripts for over two years now and I’ve never been able to get to the bottom of it. From what I’ve been able to find online, getting segmentation faults from Python scripts is strange and has little to do with the Python-side of things. It’s sometimes CUDA, sometimes the operating system or even the C-based infrastructure that Python uses. I’ve never bothered to update or reinstall the OS (Ubuntu 18.04 in my case) and segmentation fault from scikit-learn scripts tells me that the cause of the problem is probably not CUDA.
One consistent behavior in my system (and this would most probably be system-dependent) is when I run training or inference of a transformer model in CPU, either segmentation fault happens or the PC freezes and crashes. So, a rule of thumb for me is to run the model always on GPUs and CUDA. Maybe you can also try:
Segmentation fault also occasionally happens in my system even if I run the code on GPU but it doesn’t consistently occur. If this doesn’t work, I guess your best bet is to reinstall/update parts of the infrastructure. Since it is the CPU that fails in my case, I suspect it might even be a hardware-related defect.
Thank you for your previous suggestions. I tried both running the model on GPU and reduced batch_size, but unfortunately, the segmentation fault issue still persists.
Here are the debug logs I collected during program execution: