Hi, Iβm encountering an issue while trying to use the FluxPipeline from the diffusers library. The pipeline seems to get stuck halfway when loading components, with no error message displayed. Here are the details:
Environment:
Windows 10
Python 3.12.4
NVIDIA GeForce GTX 1650 GPU
CUDA version: 11.8
Code:
import torch
from huggingface_hub import login
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(torch.cuda.get_device_name(device))
print(torch.cuda.mem_get_info(device))
login(token="...")
from diffusers import FluxPipeline
print(torch.version.cuda)
pipe_cpu = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.float32
)
print('cpu')
pipe = pipe_cpu.to("cuda:0")
print("pip")
prompt = "A homeless cat holding a cardboard sign that says 'Hi Mom!'"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
print("image")
image.save("flux-dev.png")
print("save")
First, since you specified 32-bit precision, this would require at least 70GB of VRAM or RAM. Next, since you are trying to load the entire model into the GPU with .to(βcuda:0β), this would result in an error if there was not enough free space in the VRAM.
Rewrite it as follows. I think it will work if you have about 40GB of RAM.
pip install -U accelerate
import torch
from huggingface_hub import login
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "expandable_segments:True"
device = torch.device("cuda:0" if torch.cuda.is_available() else "cpu")
print(torch.cuda.get_device_name(device))
print(torch.cuda.mem_get_info(device))
login(token="...")
from diffusers import FluxPipeline
print(torch.version.cuda)
pipe_cpu = FluxPipeline.from_pretrained(
"black-forest-labs/FLUX.1-dev", torch_dtype=torch.bfloat16, device_map="auto"
)
print('cpu')
#pipe = pipe_cpu.to("cuda:0")
pipe = pipe_cpu
pipe.enable_cpu_offloading()
print("pip")
prompt = "A homeless cat holding a cardboard sign that says 'Hi Mom!'"
image = pipe(
prompt,
height=1024,
width=1024,
guidance_scale=3.5,
num_inference_steps=50,
max_sequence_length=512,
generator=torch.Generator("cuda").manual_seed(0),
).images[0]
print("image")
image.save("flux-dev.png")
print("save")
Itβs difficult. First of all, the size of the FLUX model exceeds 30GB with 16-bit precision. Diffusers do not support precision of less than 8-bit except for quantization, and when quantizing, the entire model must be placed on the GPU in principle, so a GPU with about 12GB of VRAM is required. The minimum line for inference with the CPU is a little over 30GB of memory.
However, if you donβt have enough RAM, the OS will usually use the HDD or SSD as virtual memory, so it may still be possible to run.
It will be slow, but I personally think that it will probably just about work even with 16GB of RAM. I donβt recommend itβ¦