Using FLUX.1 with an 8GB VRAM GPU is incredibly unreasonable, but apparently it’s currently possible if you drastically lower the output resolution…?
Personally, though, I recommend buying a GPU or renting one via the cloud… I’m not too familiar with it myself… Hugging Face also offers GPU rentals, but users need programming knowledge to use them.
Use one of these three working paths on an RX 7600 (8 GB). Primary: ComfyUI + GGUF + x-flux. Backup: Forge (AMD fork) via ZLUDA/DirectML. Lightweight: sd.cpp GUI (Vulkan). All three load FLUX base + your FLUX LoRA and fit in 8 GB with Q4_0 quant. (Hugging Face)
Path 1 — ComfyUI + GGUF + x-flux (recommended)
What this does
- Runs FLUX.1-dev as a GGUF quant so it fits in 8 GB.
- Loads CLIP-L, T5-XXL, and the Flux VAE (
ae.safetensors).
- Applies your Flux LoRA in the x-flux node. (Hugging Face)
Install the two add-ons
REM ComfyUI-GGUF: loads Flux .gguf Unet + optional T5 gguf
REM https://github.com/city96/ComfyUI-GGUF
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt
REM x-flux-comfyui: Flux LoRA / ControlNet nodes + low-memory recipe
REM https://github.com/XLabs-AI/x-flux-comfyui
git clone https://github.com/XLabs-AI/x-flux-comfyui ComfyUI/custom_nodes/x-flux-comfyui
.\python_embeded\python.exe .\ComfyUI\custom_nodes\x-flux-comfyui\setup.py
x-flux creates ComfyUI/models/xlabs/loras/ on first launch. Put your LoRA there. It also documents a low-VRAM launch and the swap from “Load Diffusion Model” → “Unet Loader (GGUF)”. (GitHub)
Download models and place files
ComfyUI/
└─ models/
├─ unet/
│ └─ flux1-dev-Q4_0.gguf # base model (≈6.79 GB)
├─ text_encoders/
│ ├─ clip_l.safetensors
│ └─ t5xxl_fp8_e4m3fn.safetensors # start with FP8 to save VRAM
├─ vae/
│ └─ ae.safetensors # Flux VAE
└─ xlabs/
└─ loras/
└─ your_flux_lora.safetensors
- GGUF sizes (Q4_0 ≈ 6.79 GB, Q2_K ≈ 4.03 GB) are on the model card; Q4_0 is the sweet spot for 8 GB. (Hugging Face)
- VAE
ae.safetensors is the standard Flux decoder; put it in models/vae/. (ComfyUI)
Launch in low-VRAM mode
REM Low-memory flags from x-flux guide
REM https://github.com/XLabs-AI/x-flux-comfyui
.\python_embeded\python.exe ComfyUI\main.py --lowvram --preview-method auto --use-split-cross-attention
Then, in your workflow, replace “Load Diffusion Model” with Unet Loader (GGUF) and pick the .gguf. (GitHub)
Minimal graph wiring (text→image)
- Unet Loader (GGUF) →
flux1-dev-Q4_0.gguf.
- DualCLIPLoader →
clip_l.safetensors + t5xxl_fp8_e4m3fn.safetensors.
- Load VAE →
ae.safetensors.
- Flux LoRA node → select your LoRA, start weight 0.8–1.0.
- Sampler → try DPM++ 2M with SGM uniform, 22–26 steps, guidance 1.5–2.5, 640–704 px single image.
Comfy’s official Flux tutorial shows file locations and loader nodes; use it to cross-check your UI. (ComfyUI)
If you see OOM or slow loads
- Stay at 576–704 px.
- Keep Q4_0. If still tight, use T5 GGUF via the GGUF loaders to reclaim VRAM.
- The ComfyUI-GGUF readme explains that DiT models like Flux quantize well, and LoRA works with the built-in loader. (GitHub)
Why quant + offload is required
FLUX.1-dev is ~12B params. Full bf16 loads need about ~33 GB across the DiT + two text encoders. You must quantize on 8 GB. (Hugging Face)
Path 2 — Forge (AMD fork) with ZLUDA or DirectML
When to use: you want an A1111-style web UI instead of node graphs. This AMD fork exposes --use-zluda and --use-directml. Forge upstream documents Flux NF4/GGUF and LoRA support for Flux. (GitHub)
Install and start
REM AMD-ready Forge fork (pick one backend)
REM https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge
git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge forge-amd
cd forge-amd
REM Prefer ZLUDA on recent AMD; fall back to DirectML if needed
start webui-user.bat --use-zluda
:: or:
:: start webui-user.bat --use-directml
Backend flags are listed in that fork’s README. (GitHub)
Put models in Forge folders
forge-amd/models/
├─ Stable-diffusion/ # flux1-dev-*.gguf
├─ VAE/ # ae.safetensors
├─ text_encoder/ # clip_l, t5xxl_* (fp8 is safer on 8 GB)
└─ Lora/ # your Flux LoRA
Forge’s readme states Flux BNF/NF4 and GGUF are supported, with LoRA support for these formats. Select UI mode: Flux, then choose checkpoint + VAE + text encoders in the UI. (GitHub)
Notes: ZLUDA can be faster than DirectML but is more finicky to set up; the fork recommends ZLUDA for new AMD cards. (GitHub)
Path 3 — sd.cpp desktop GUI (Vulkan, Python-free)
When to use: you want a very light install that works well on AMD. stable-diffusion.cpp supports Flux-dev/Flux-schnell, LoRA, and Vulkan backends. Several GUIs wrap it. (GitHub)
Steps
- Install an sd.cpp GUI and set Backend = Vulkan.
- Add flux1-dev-Q4_0.gguf, ae.safetensors, and your LoRA, then generate.
The sd.cpp README lists Flux, LoRA, and Vulkan support explicitly. (GitHub)
Safe starting presets (portraits with your LoRA)
- 640–704 px square, 22–26 steps, guidance 1.5–2.5, LoRA weight 0.8–1.0.
- If likeness is weak, raise LoRA weight slowly. If artifacts appear, reduce weight or steps.
Comfy’s Flux tutorial confirms component layout and FP8 text-encoder option for low VRAM. (ComfyUI)
Quick reference commands you can paste
:: 1) Install ComfyUI-GGUF
:: https://github.com/city96/ComfyUI-GGUF
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt
:: 2) Install x-flux-comfyui (creates .../xlabs/loras on first run)
:: https://github.com/XLabs-AI/x-flux-comfyui
git clone https://github.com/XLabs-AI/x-flux-comfyui ComfyUI/custom_nodes/x-flux-comfyui
.\python_embeded\python.exe .\ComfyUI\custom_nodes\x-flux-comfyui\setup.py
:: 3) Low-VRAM launch for 8 GB
:: https://github.com/XLabs-AI/x-flux-comfyui
.\python_embeded\python.exe ComfyUI\main.py --lowvram --preview-method auto --use-split-cross-attention
:: 4) Forge AMD fork (pick one backend)
:: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge
start webui-user.bat --use-zluda
:: or
start webui-user.bat --use-directml
Licensing note
FLUX.1-dev is non-commercial. GGUF conversions inherit the same license; check before selling outputs. (Hugging Face)
Why this fits in 8 GB (background)
- Full-precision Flux loads are ~33 GB bf16 across the DiT + text encoders.
- Q4_0 GGUF (~6.79 GB) reduces the UNet footprint to fit with encoders + VAE on an 8 GB card, especially with low-VRAM flags or a GGUF T5. (Hugging Face)
Short, curated extras
Core docs and model cards
- ComfyUI Flux tutorial: file locations, loaders, FP8 option. Useful to verify your node wiring. (ComfyUI)
- FLUX.1-dev GGUF sizes and placement note (
models/unet). Picks Q4_0 or Q2_K. (Hugging Face)
Add-ons
- ComfyUI-GGUF readme: Unet Loader (GGUF), T5 GGUF, LoRA support note. Good for low-VRAM behavior. (GitHub)
- x-flux-comfyui guide: LoRA folder path, low-memory launch flags. (GitHub)
Alternatives
- Forge AMD fork: ZLUDA/DirectML flags; upstream Forge’s Flux + LoRA support claim. Use if you prefer A1111-style UI. (GitHub)
- sd.cpp: Flux + LoRA + Vulkan. Use for a Python-free, AMD-friendly stack. (GitHub)