Loading flux from Local safetensors

Felox · November 12, 2024, 11:18am

I wanted to load flux fp8 from a local safetensor file. I do’t want it from hg_hub since I already have it locally with comfyui.

so far I did

import torch
from diffusers import FluxTransformer2DModel, FluxPipeline
from transformers import T5EncoderModel, CLIPTextModel
from optimum.quanto import freeze, qfloat8, quantize, QuantizedDiffusersModel

transformer = FluxTransformer2DModel.from_single_file("path_to_comfyui/ComfyUI/models/unet/flux1DevFp8_v10.safetensors",  torch_dtype=dtype)

And got


SafetensorError: Error while deserializing header: InvalidHeaderDeserialization

During handling of the above exception, another exception occurred:

and OSError: Unable to load weights from checkpoint file for

I did try multiple dtype.

I’m on windows 10 using conda, python 3.10.13 ; torch 2.2.2 ; diffusers 0.31.0

John6666 · November 12, 2024, 12:23pm

I think you need to do it this way.

transformer = FluxTransformer2DModel.from_single_file("path_to_comfyui/ComfyUI/models/unet/flux1DevFp8_v10.safetensors", subfolder="transformer", torch_dtype=dtype)

Felox · November 12, 2024, 1:41pm

Thanks for your answer.
Infortunately, it still gives me a SafetensorError: Error while deserializing header: InvalidHeaderDeserialization error.

Tested again in bfloat16, float16, qfloat8 format

John6666 · November 12, 2024, 1:50pm

That’s strange. The error message says that the safetensors file is corrupted, but if it’s working in ComfyUI, it’s hard to believe that it’s really corrupted. Is it a bug related to the library version?

pip install -U safetensors diffusers

Felox · November 14, 2024, 11:08am

I did delete my whole conda env to be on the latest version available of diffusers & safetensors. I’m on python 3.12.7, torch 2.5.1-cu11.8, diffusers 0.31, safetensors 0.4.5.

Still got error while putting the pipe to cuda pipe.to("cuda") : with ImportError: DLL load failed while importing quanto_cuda: no module found

John6666 · November 14, 2024, 11:44am

In flux or SD3.5, you maybe need also:

pip install optimum-quanto bitsandbytes torchao transformers numpy<2 sentencepiece peft accelerate

Felox · November 14, 2024, 1:05pm

I did and still got issue at loading:

RuntimeError: Failed to import diffusers.models.transformers.transformer_flux because of the following error (look up to see its traceback):
Failed to import diffusers.loaders.unet because of the following error (look up to see its traceback):
Failed to import transformers.modeling_utils because of the following error (look up to see its traceback):
cannot import name 'quantize_' from 'torchao.quantization' (e:\conda\envs\diffuser\Lib\site-packages\torchao\quantization\__init__.py)

my pip list :

Package            Version
------------------ -----------
accelerate         1.1.1
asttokens          2.0.5
bitsandbytes       0.44.1
Brotli             1.0.9
certifi            2024.8.30
charset-normalizer 3.3.2
colorama           0.4.6
comm               0.2.1
debugpy            1.6.7
decorator          5.1.1
diffusers          0.31.0
executing          0.8.3
filelock           3.13.1
fsspec             2024.10.0
huggingface-hub    0.26.2
idna               3.7
importlib_metadata 8.5.0
ipykernel          6.29.5
ipython            8.27.0
jedi               0.19.1
Jinja2             3.1.4
jupyter_client     8.6.0
jupyter_core       5.7.2
MarkupSafe         2.1.3
matplotlib-inline  0.1.6
mkl_fft            1.3.11
mkl_random         1.2.8
mkl-service        2.4.0
mpmath             1.3.0
nest-asyncio       1.6.0
networkx           3.2.1
ninja              1.11.1.1
numpy              1.26.4
optimum-quanto     0.2.6
packaging          24.1
parso              0.8.3
peft               0.13.2
pillow             10.4.0
pip                24.2
platformdirs       3.10.0
prompt-toolkit     3.0.43
protobuf           5.28.3
psutil             5.9.0
pure-eval          0.2.2
Pygments           2.15.1
PySocks            1.7.1
python-dateutil    2.9.0.post0
pywin32            305.1
PyYAML             6.0.2
pyzmq              25.1.2
regex              2024.11.6
requests           2.32.3
safetensors        0.4.5
sentencepiece      0.2.0
setuptools         75.1.0
six                1.16.0
stack-data         0.2.0
sympy              1.13.1
tokenizers         0.20.3
torch              2.5.1
torchao            0.1
torchaudio         2.5.1
torchvision        0.20.1
tornado            6.4.1
tqdm               4.67.0
traitlets          5.14.3
transformers       4.46.2
typing_extensions  4.11.0
urllib3            2.2.3
wcwidth            0.2.5
wheel              0.44.0
win-inet-pton      1.1.0
zipp               3.21.0

John6666 · November 14, 2024, 1:09pm

torchao seems to be not updated. try this.

pip install -U optimum-quanto bitsandbytes torchao transformers sentencepiece peft accelerate
pip install numpy<2

Felox · November 14, 2024, 1:19pm

Upgrading torchao ( pip install torchao==0.6.1)

ERROR: Could not find a version that satisfies the requirement torchao==0.6.1 (from versions: 0.0.1, 0.0.3, 0.1)
ERROR: No matching distribution found for torchao==0.6.1

I think there are no release for cu11.8 I may need to updgrade to torch=2.5.1-cu121 [torchao github]

John6666 · November 14, 2024, 1:20pm

torch=2.5.1-cu121

yea. or torch==2.4.0-cu121 to 124 is relatively stable.

Good. Last resort.

pip install git+https://github.com/pytorch/ao

Felox · November 14, 2024, 2:56pm

I did change my torch version to 2.4.0-cu121 and still had to build torchao from repo. When doing inference the pipe seems to not be running on gpu. torch work on cuda thought. when I do a pipe.to("cuda") It gives me a ImportError: DLL load failed while importing quanto_cuda. '

John6666 · November 14, 2024, 3:07pm

Hmmm. Possibly:

Anyway, if it doesn’t work up to this point, it’s more likely to be a CUDA-related installation error than a problem with the Python library. I also use Windows, and Windows is very prone to errors like this.

Felox · November 18, 2024, 2:06pm

I did try my conda environement on another 1.5 model and got no issues what’s so ever (AnalogMadness loaded from a single file and got 50 step in 6.4s). The latest link you give me is more about dlib installation on windows, so far the diffusers library do not rely on that. I did get rid of all the quantization function since the latest error came from that, the model is loaded on the gpu, but it still take forever to generate an image.

Still loading after 20 minutes. As comparaison, a flux inference with the same model on comfyui take between 30sec and a minute on my gpu.

John6666 · November 18, 2024, 2:12pm

If you don’t quantize, FLUX is over 30GB, so it won’t fit in VRAM, so that’s what happens…
If all goes well, it should work, but it’s not very compatible with Windows environments.

Felox · November 18, 2024, 5:43pm

Thanks for your answers. I tried and implementation of the quantization_config = BitsAndBytesConfig(load_in_8bit=True)

import torch

from diffusers import FluxTransformer2DModel, FluxPipeline, BitsAndBytesConfig
from transformers import T5EncoderModel, CLIPTextModel

bfl_repo = "black-forest-labs/FLUX.1-dev"

dtype = torch.bfloat16
#dtype = torch.float16
#dtype = torch.float8_e4m3fn

quantization_config = BitsAndBytesConfig(load_in_8bit=True)

transformer = FluxTransformer2DModel.from_single_file(".../flux1DevFp8_v10.safetensors", quantization_config=quantization_config, dtype=dtype)

pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=transformer, text_encoder_2=None, quantization_config=quantization_config, dtype=dtype)

text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", quantization_config=quantization_config)
pipe.text_encoder_2 = text_encoder_2



prompt = "A cat holding a sign that says hello world"
image = pipe(
    prompt,
    guidance_scale=3.5,
    output_type="pil",
    num_inference_steps=20,
    #generator=torch.Generator("cpu").manual_seed(0).to(dtype)
).images[0]

And still got RuntimeError: mat1 and mat2 must have the same dtype, but got Half and Float thus I tried different dtype (bfloat16, float16, float8, float8_e4m3fn).

what bugs me is that it work well in comfyui althought its environement doesn’t require quanto.

I did output a verbose from a comfyui generation

John6666 · November 19, 2024, 1:13am

The ComfyUI log looks fine.
From the code, it looks like all the components are being loaded with 8-bit quantization by bitsandbytes, but why is there a discrepancy error between mat1 and mat2…?
My PC is too weak for FLUX, so I’ll try it on the cloud on HF later.

John6666 · November 19, 2024, 6:35am

I found a bug. torch_dtype= is correct for Difffusers.

transformer = FluxTransformer2DModel.from_single_file(".../flux1DevFp8_v10.safetensors", quantization_config=quantization_config, torch_dtype=dtype)
text_encoder_2 = T5EncoderModel.from_pretrained(bfl_repo, subfolder="text_encoder_2", quantization_config=quantization_config)
pipe = FluxPipeline.from_pretrained(bfl_repo, transformer=transformer, text_encoder_2=text_encoder_2, quantization_config=quantization_config, torch_dtype=dtype)

Topic		Replies	Views
Flux.1-dev installation Models	1	3081	August 31, 2024
Error while loading the model using safe tensors 🤗Transformers	0	648	July 11, 2023
From_single_file dtype issue 🧨 Diffusers	0	526	December 4, 2023
Safetensors model file Beginners	1	1773	November 30, 2023
Loading in model safetensors 🧨 Diffusers	1	1869	May 22, 2023

Loading flux from Local safetensors

Related topics