Loading flux from Local safetensors

I wanted to load flux fp8 from a local safetensor file. I do’t want it from hg_hub since I already have it locally with comfyui.

so far I did

import torch
from diffusers import FluxTransformer2DModel, FluxPipeline
from transformers import T5EncoderModel, CLIPTextModel
from optimum.quanto import freeze, qfloat8, quantize, QuantizedDiffusersModel

transformer = FluxTransformer2DModel.from_single_file("path_to_comfyui/ComfyUI/models/unet/flux1DevFp8_v10.safetensors",  torch_dtype=dtype)

And got


SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

During handling of the above exception, another exception occurred:

and OSError: Unable to load weights from checkpoint file for

I did try multiple dtype.

I’m on windows 10 using conda, python 3.10.13 ; torch 2.2.2 ; diffusers 0.31.0

1 Like

I think you need to do it this way.

transformer = FluxTransformer2DModel.from_single_file("path_to_comfyui/ComfyUI/models/unet/flux1DevFp8_v10.safetensors", subfolder="transformer", torch_dtype=dtype)

Thanks for your answer.
Infortunately, it still gives me a SafetensorError: Error while deserializing header: InvalidHeaderDeserialization error.

Tested again in bfloat16, float16, qfloat8 format

1 Like

That’s strange. The error message says that the safetensors file is corrupted, but if it’s working in ComfyUI, it’s hard to believe that it’s really corrupted. Is it a bug related to the library version?

pip install -U safetensors diffusers

I did delete my whole conda env to be on the latest version available of diffusers & safetensors. I’m on python 3.12.7, torch 2.5.1-cu11.8, diffusers 0.31, safetensors 0.4.5.

Still got error while putting the pipe to cuda pipe.to("cuda") : with ImportError: DLL load failed while importing quanto_cuda: no module found

1 Like

In flux or SD3.5, you maybe need also:

pip install optimum-quanto bitsandbytes torchao transformers numpy<2 sentencepiece peft accelerate

I did and still got issue at loading:

RuntimeError: Failed to import diffusers.models.transformers.transformer_flux because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'quantize_' from 'torchao.quantization' (e:\conda\envs\diffuser\Lib\site-packages\torchao\quantization\__init__.py)

my pip list :

Package            Version
------------------ -----------
accelerate         1.1.1
asttokens          2.0.5
bitsandbytes       0.44.1
Brotli             1.0.9
certifi            2024.8.30
charset-normalizer 3.3.2
colorama           0.4.6
comm               0.2.1
debugpy            1.6.7
decorator          5.1.1
diffusers          0.31.0
executing          0.8.3
filelock           3.13.1
fsspec             2024.10.0
huggingface-hub    0.26.2
idna               3.7
importlib_metadata 8.5.0
ipykernel          6.29.5
ipython            8.27.0
jedi               0.19.1
Jinja2             3.1.4
jupyter_client     8.6.0
jupyter_core       5.7.2
MarkupSafe         2.1.3
matplotlib-inline  0.1.6
mkl_fft            1.3.11
mkl_random         1.2.8
mkl-service        2.4.0
mpmath             1.3.0
nest-asyncio       1.6.0
networkx           3.2.1
ninja              1.11.1.1
numpy              1.26.4
optimum-quanto     0.2.6
packaging          24.1
parso              0.8.3
peft               0.13.2
pillow             10.4.0
pip                24.2
platformdirs       3.10.0
prompt-toolkit     3.0.43
protobuf           5.28.3
psutil             5.9.0
pure-eval          0.2.2
Pygments           2.15.1
PySocks            1.7.1
python-dateutil    2.9.0.post0
pywin32            305.1
PyYAML             6.0.2
pyzmq              25.1.2
regex              2024.11.6
requests           2.32.3
safetensors        0.4.5
sentencepiece      0.2.0
setuptools         75.1.0
six                1.16.0
stack-data         0.2.0
sympy              1.13.1
tokenizers         0.20.3
torch              2.5.1
torchao            0.1
torchaudio         2.5.1
torchvision        0.20.1
tornado            6.4.1
tqdm               4.67.0
traitlets          5.14.3
transformers       4.46.2
typing_extensions  4.11.0
urllib3            2.2.3
wcwidth            0.2.5
wheel              0.44.0
win-inet-pton      1.1.0
zipp               3.21.0
1 Like

torchao seems to be not updated. try this.

pip install -U optimum-quanto bitsandbytes torchao transformers sentencepiece peft accelerate
pip install numpy<2

Upgrading torchao ( pip install torchao==0.6.1)

ERROR: Could not find a version that satisfies the requirement torchao==0.6.1 (from versions: 0.0.1, 0.0.3, 0.1)
ERROR: No matching distribution found for torchao==0.6.1

I think there are no release for cu11.8 I may need to updgrade to torch=2.5.1-cu121 [torchao github]

1 Like

torch=2.5.1-cu121

yea. or torch==2.4.0-cu121 to 124 is relatively stable.

Good. Last resort.

pip install git+https://github.com/pytorch/ao
1 Like

I did change my torch version to 2.4.0-cu121 and still had to build torchao from repo. When doing inference the pipe seems to not be running on gpu. torch work on cuda thought. when I do a pipe.to("cuda") It gives me a ImportError: DLL load failed while importing quanto_cuda. '

Hmmm. Possibly:

Anyway, if it doesn’t work up to this point, it’s more likely to be a CUDA-related installation error than a problem with the Python library. I also use Windows, and Windows is very prone to errors like this.

I did try my conda environement on another 1.5 model and got no issues what’s so ever (AnalogMadness loaded from a single file and got 50 step in 6.4s). The latest link you give me is more about dlib installation on windows, so far the diffusers library do not rely on that. I did get rid of all the quantization function since the latest error came from that, the model is loaded on the gpu, but it still take forever to generate an image.

Still loading after 20 minutes. As comparaison, a flux inference with the same model on comfyui take between 30sec and a minute on my gpu.

1 Like

If you don’t quantize, FLUX is over 30GB, so it won’t fit in VRAM, so that’s what happens…
If all goes well, it should work, but it’s not very compatible with Windows environments.

Thanks for your answers. I tried and implementation of the quantization_config = BitsAndBytesConfig(load_in_8bit=True)

import torch

from diffusers import FluxTransformer2DModel, FluxPipeline, BitsAndBytesConfig
from transformers import T5EncoderModel, CLIPTextModel

bfl_repo = "black-forest-labs/FLUX.1-dev"

dtype = torch.bfloat16
#dtype = torch.float16
#dtype = torch.float8_e4m3fn

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

transformer = FluxTransformer2DModel.from_single_file(".../flux1DevFp8_v10.safetensors", quantization_config=quantization_config, dtype=dtype)

pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=transformer, text_encoder_2=None, quantization_config=quantization_config, dtype=dtype)

text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", quantization_config=quantization_config)
pipe.text_encoder_2 = text_encoder_2



prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    guidance_scale=3.5,
    output_type="pil",
    num_inference_steps=20,
    #generator=torch.Generator("cpu").manual_seed(0).to(dtype)
).images[0]

And still got RuntimeError: mat1 and mat2 must have the same dtype, but got Half and Float thus I tried different dtype (bfloat16, float16, float8, float8_e4m3fn).

what bugs me is that it work well in comfyui althought its environement doesn’t require quanto.

I did output a verbose from a comfyui generation

1 Like

The ComfyUI log looks fine.
From the code, it looks like all the components are being loaded with 8-bit quantization by bitsandbytes, but why is there a discrepancy error between mat1 and mat2…?:exploding_head:
My PC is too weak for FLUX, so I’ll try it on the cloud on HF later.

I found a bug. torch_dtype= is correct for Difffusers.

transformer = FluxTransformer2DModel.from_single_file(".../flux1DevFp8_v10.safetensors", quantization_config=quantization_config, torch_dtype=dtype)
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", quantization_config=quantization_config)
pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=transformer, text_encoder_2=text_encoder_2, quantization_config=quantization_config, torch_dtype=dtype)