PTX JIT broken on RTX 5080 (Blackwell / sm_120) – missing libnvptxcompiler.so in CUDA 12.8 / 12.9

Spechawk · July 4, 2025, 9:09pm

Hey folks ,

I’m experimenting with LLM inference (vLLM, FlashAttention, etc.) on a custom-built workstation featuring an RTX 5080 (Blackwell architecture). I compiled PyTorch 2.9.0 myself with support for sm_120 (Blackwell) using CUDA 12.8.

Everything works great with AOT compilation, but JIT compilation fails using torch.utils.cpp_extension.load() or similar APIs — e.g. when building FlashAttention, custom CUDA kernels, or low-level ops for vLLM.

The error:

After deep debugging, I found the culprit: libnvptxcompiler.so is completely missing from the .run and .deb CUDA 12.8 and 12.9 installers — as well as the official Docker images (!). This breaks JIT support entirely for Blackwell cards.

Full report with reproduction steps and technical analysis:
https://forums.developer.nvidia.com/t/missing-libnvptxcompiler-so-in-cuda-12-8-12-9-blocking-ptx-jit-on-blackwell-gpus-sm-120-rtx-5080/338033

Let me know if you’ve encountered this, or found a workaround. Currently there’s no official CUDA support for JIT kernel compilation on Blackwell — which breaks a lot of modern tooling (vLLM, FlashAttention, cpp_extension, etc.).

Thanks!

Technical FAQ – Compilation vs Runtime Execution

Q: How were you able to compile PyTorch, FlashAttention, or vLLM without libnvptxcompiler.so?
A: This shared object (.so) is only required for runtime JIT PTX compilation.
It’s not needed for AOT (Ahead-Of-Time) builds, as long as the kernels are compiled ahead with the correct flags (sm_120, compute_120, etc.).
→ That’s why compilation succeeds, but any dynamic JIT (e.g. cpp_extension.load()) fails at runtime.

Topic		Replies	Views
Regarding run_t5_mlm_flax.py Beginners	0	635	June 8, 2023
How to run Hunyuan-A13B on a RTX 5090 / Blackwell? Models	1	41	July 1, 2025
Baffling performance issue on most NVidia GPUs with simple transformers + pytorch code Intermediate	5	4500	April 9, 2024
Need help performance issues transformers.AutoModelForCausalLM.from_pretrained( 'mosaicml/mpt-7b-instruct' Beginners	0	930	June 12, 2023
Running ctransformers with cuda 11.4 or lower 🤗Transformers	1	2956	June 7, 2024

PTX JIT broken on RTX 5080 (Blackwell / sm_120) – missing libnvptxcompiler.so in CUDA 12.8 / 12.9

Related topics