Hey folks ,
I’m experimenting with LLM inference (vLLM, FlashAttention, etc.) on a custom-built workstation featuring an RTX 5080 (Blackwell architecture). I compiled PyTorch 2.9.0 myself with support for sm_120
(Blackwell) using CUDA 12.8.
Everything works great with AOT compilation, but JIT compilation fails using torch.utils.cpp_extension.load()
or similar APIs — e.g. when building FlashAttention, custom CUDA kernels, or low-level ops for vLLM.
The error:
After deep debugging, I found the culprit: libnvptxcompiler.so is completely missing from the .run
and .deb
CUDA 12.8 and 12.9 installers — as well as the official Docker images (!). This breaks JIT support entirely for Blackwell cards.
Full report with reproduction steps and technical analysis:
https://forums.developer.nvidia.com/t/missing-libnvptxcompiler-so-in-cuda-12-8-12-9-blocking-ptx-jit-on-blackwell-gpus-sm-120-rtx-5080/338033
Let me know if you’ve encountered this, or found a workaround. Currently there’s no official CUDA support for JIT kernel compilation on Blackwell — which breaks a lot of modern tooling (vLLM, FlashAttention, cpp_extension, etc.).
Thanks!
Technical FAQ – Compilation vs Runtime Execution
Q: How were you able to compile PyTorch, FlashAttention, or vLLM without libnvptxcompiler.so
?
A: This shared object (.so
) is only required for runtime JIT PTX compilation.
It’s not needed for AOT (Ahead-Of-Time) builds, as long as the kernels are compiled ahead with the correct flags (sm_120
, compute_120
, etc.).
→ That’s why compilation succeeds, but any dynamic JIT (e.g. cpp_extension.load()
) fails at runtime.