Floating point exception with nightly pytorch and cuda

Hi,

First of all excuse me if this post is off-topic but I believe this issue is caused by Diffusers, but I’m not opening an issue in git because it’s probably a misconfiguration rather than a bug. So I’m new here in transformers and diffusers and I’m facing some issues with nightly versions of pytorch. Specifically, nvidia-smi returns CUDA version 12.9 and NVIDIA drivers version 575 (see below) and I installed the nightly pytorch to be compatible with this version of cuda using the website’s selector. I executed some testing scripts and they confirm pytorch is working fine (some complex math calculations using CUDA). However, when I try to run a vision model using Diffusers I get Floating point exception, and that’s it, not even the typical traceback. Specifically, I tried the example code snippet for Stable Diffusion 3.5 medium:

import torch
from diffusers import StableDiffusion3Pipeline

pipe = StableDiffusion3Pipeline.from_pretrained("stabilityai/stable-diffusion-3.5-medium", torch_dtype=torch.bfloat16)
pipe = pipe.to("cuda")

image = pipe(
    "A capybara holding a sign that reads Hello World",
    num_inference_steps=40,
    guidance_scale=4.5,
).images[0]
image.save("capybara.png")

I had no luck finding solutions to my problem, as result I found were regarding code issues not libraries/drivers issues. Here I leave you some relevant information about my environment:

python
Python 3.11.2 (main, Apr 28 2025, 14:11:48) [GCC 12.2.0] on linux
Type "help", "copyright", "credits" or "license" for more information.
>>> import torch
>>> torch.version.cuda
'12.9'
>>> torch.cuda.is_available()
True
>>> torch.cuda.device_count
<function device_count at 0x7f60497056c0>
>>> torch.cuda.device_count()
1
>>> torch.cuda.get_device_name()
'NVIDIA GeForce RTX 5060 Ti'

I installed this pytorch using pip3 install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu129

pip list
Package                  Version
------------------------ ------------------------
bitsandbytes             0.46.1
certifi                  2025.7.14
charset-normalizer       3.4.2
diffusers                0.34.0
filelock                 3.18.0
fsspec                   2025.7.0
hf-xet                   1.1.5
huggingface-hub          0.33.4
idna                     3.10
importlib_metadata       8.7.0
Jinja2                   3.1.6
MarkupSafe               3.0.2
mpmath                   1.3.0
networkx                 3.5
numpy                    2.3.1
nvidia-cublas-cu12       12.9.1.4
nvidia-cuda-cupti-cu12   12.9.79
nvidia-cuda-nvrtc-cu12   12.9.86
nvidia-cuda-runtime-cu12 12.9.79
nvidia-cudnn-cu12        9.10.2.21
nvidia-cufft-cu12        11.4.1.4
nvidia-cufile-cu12       1.14.1.1
nvidia-curand-cu12       10.3.10.19
nvidia-cusolver-cu12     11.7.5.82
nvidia-cusparse-cu12     12.5.10.65
nvidia-cusparselt-cu12   0.7.1
nvidia-nccl-cu12         2.27.5
nvidia-nvjitlink-cu12    12.9.86
nvidia-nvshmem-cu12      3.3.9
nvidia-nvtx-cu12         12.9.79
packaging                25.0
pillow                   11.2.1
pip                      23.0.1
pytorch-triton           3.4.0+gitae848267
PyYAML                   6.0.2
regex                    2024.11.6
requests                 2.32.4
safetensors              0.5.3
setuptools               66.1.1
sympy                    1.14.0
torch                    2.9.0.dev20250716+cu129
torchaudio               2.8.0.dev20250716+cu129
torchvision              0.24.0.dev20250716+cu129
tqdm                     4.67.1
triton                   3.3.1
typing_extensions        4.14.1
urllib3                  2.5.0
zipp                     3.23.0
nvidia-smi
Wed Jul 16 15:58:48 2025       
+-----------------------------------------------------------------------------------------+
| NVIDIA-SMI 575.57.08              Driver Version: 575.57.08      CUDA Version: 12.9     |
|-----------------------------------------+------------------------+----------------------+
| GPU  Name                 Persistence-M | Bus-Id          Disp.A | Volatile Uncorr. ECC |
| Fan  Temp   Perf          Pwr:Usage/Cap |           Memory-Usage | GPU-Util  Compute M. |
|                                         |                        |               MIG M. |
|=========================================+========================+======================|
|   0  NVIDIA GeForce RTX 5060 Ti     On  |   00000000:01:00.0  On |                  N/A |
|  0%   42C    P5              4W /  180W |      10MiB /  16311MiB |      0%      Default |
|                                         |                        |                  N/A |
+-----------------------------------------+------------------------+----------------------+
                                                                                         
+-----------------------------------------------------------------------------------------+
| Processes:                                                                              |
|  GPU   GI   CI              PID   Type   Process name                        GPU Memory |
|        ID   ID                                                               Usage      |
|=========================================================================================|
|  No running processes found                                                             |
+-----------------------------------------------------------------------------------------+

Let me know if you need any additional information or if I can test something for you. FYI, Ollama runs fine in this setup. Thanks for your time.

1 Like

I think this is the cause in the Hopper architecture, but you are using Blackwell…

1 Like

Yes, effectively I tried downgrading nvidia-cublas-cu12 to 12.4.5.8 but it didn’t fix the issue as the problem was with previous versions of the lib and as you said, with Hopper arch. I’m not sure if I should open an issue with diffusers in github… Thanks.

1 Like

Yeah, I think the root cause is probably upstream (PyTorch or CUDA), but we can’t move forward without knowing which part of the Diffusers’ SD 3.5 pipeline is causing the problem, so it would be best to raise an issue with Diffusers. Reproduction is very simple if anyone has a 50x0…:sweat_smile:

1 Like

Done, issue is here, I also gave credit to you. It was funny that diffusers-cli env, the tool used to gather environment data for issue logs also failed with the same error.

1 Like