That’s strange. The error message says that the safetensors file is corrupted, but if it’s working in ComfyUI, it’s hard to believe that it’s really corrupted. Is it a bug related to the library version?
I did delete my whole conda env to be on the latest version available of diffusers & safetensors. I’m on python 3.12.7, torch 2.5.1-cu11.8, diffusers 0.31, safetensors 0.4.5.
Still got error while putting the pipe to cuda pipe.to("cuda") : with ImportError: DLL load failed while importing quanto_cuda: no module found
RuntimeError: Failed to import diffusers.models.transformers.transformer_flux because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'quantize_' from 'torchao.quantization' (e:\conda\envs\diffuser\Lib\site-packages\torchao\quantization\__init__.py)
ERROR: Could not find a version that satisfies the requirement torchao==0.6.1 (from versions: 0.0.1, 0.0.3, 0.1)
ERROR: No matching distribution found for torchao==0.6.1
I think there are no release for cu11.8 I may need to updgrade to torch=2.5.1-cu121 [torchao github]
I did change my torch version to 2.4.0-cu121 and still had to build torchao from repo. When doing inference the pipe seems to not be running on gpu. torch work on cuda thought. when I do a pipe.to("cuda") It gives me a ImportError: DLL load failed while importing quanto_cuda. '
Anyway, if it doesn’t work up to this point, it’s more likely to be a CUDA-related installation error than a problem with the Python library. I also use Windows, and Windows is very prone to errors like this.
I did try my conda environement on another 1.5 model and got no issues what’s so ever (AnalogMadness loaded from a single file and got 50 step in 6.4s). The latest link you give me is more about dlib installation on windows, so far the diffusers library do not rely on that. I did get rid of all the quantization function since the latest error came from that, the model is loaded on the gpu, but it still take forever to generate an image.
If you don’t quantize, FLUX is over 30GB, so it won’t fit in VRAM, so that’s what happens…
If all goes well, it should work, but it’s not very compatible with Windows environments.
And still got RuntimeError: mat1 and mat2 must have the same dtype, but got Half and Float thus I tried different dtype (bfloat16, float16, float8, float8_e4m3fn).
what bugs me is that it work well in comfyui althought its environement doesn’t require quanto.
The ComfyUI log looks fine.
From the code, it looks like all the components are being loaded with 8-bit quantization by bitsandbytes, but why is there a discrepancy error between mat1 and mat2…?
My PC is too weak for FLUX, so I’ll try it on the cloud on HF later.