Want my Flux LoRa model to work and also want to be able to train my own SD 1.5 and SDXL model

I made a model using the Flux Easy Trainer on Replicate.

Even though my model name ends with the extension .safetensors I understand that I can only use the file with Flux as the base model.

I have two computers to work with.

One has an AMD 7600 and I have been using Amuse and RuinedFooocus with this.

The other has a Nvidia 3050Ti and I only really use RuinedFooocus through Stability Matrix.

I don’t understand how to get the Flux Dev 1 model working with anything I have. I would really like to use my LoRa model so I can put my likeness in AI photos, however, I find the whole process painstaking and frustrating.

So I want two things.

I want to be able to make AI photos from my LoRA model which contains my likeness which I believe only works with Flux.

Secondly, I want to be able to train a model, either with Dreambooth or something similar, so I can use it with SD 1.5 and SDXL models.

I’m having a really hard time trying to do either.

Does anyone have any advice? I have been asking AI but I think they run me through a circle sometimes.

1 Like

Using FLUX.1 with an 8GB VRAM GPU is incredibly unreasonable, but apparently it’s currently possible if you drastically lower the output resolution…?

Personally, though, I recommend buying a GPU or renting one via the cloud… I’m not too familiar with it myself… Hugging Face also offers GPU rentals, but users need programming knowledge to use them.


Use one of these three working paths on an RX 7600 (8 GB). Primary: ComfyUI + GGUF + x-flux. Backup: Forge (AMD fork) via ZLUDA/DirectML. Lightweight: sd.cpp GUI (Vulkan). All three load FLUX base + your FLUX LoRA and fit in 8 GB with Q4_0 quant. (Hugging Face)


Path 1 — ComfyUI + GGUF + x-flux (recommended)

What this does

  • Runs FLUX.1-dev as a GGUF quant so it fits in 8 GB.
  • Loads CLIP-L, T5-XXL, and the Flux VAE (ae.safetensors).
  • Applies your Flux LoRA in the x-flux node. (Hugging Face)

Install the two add-ons

REM ComfyUI-GGUF: loads Flux .gguf Unet + optional T5 gguf
REM https://github.com/city96/ComfyUI-GGUF
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt

REM x-flux-comfyui: Flux LoRA / ControlNet nodes + low-memory recipe
REM https://github.com/XLabs-AI/x-flux-comfyui
git clone https://github.com/XLabs-AI/x-flux-comfyui ComfyUI/custom_nodes/x-flux-comfyui
.\python_embeded\python.exe .\ComfyUI\custom_nodes\x-flux-comfyui\setup.py

x-flux creates ComfyUI/models/xlabs/loras/ on first launch. Put your LoRA there. It also documents a low-VRAM launch and the swap from “Load Diffusion Model” → “Unet Loader (GGUF)”. (GitHub)

Download models and place files

ComfyUI/
└─ models/
   ├─ unet/
   │  └─ flux1-dev-Q4_0.gguf          # base model (≈6.79 GB)
   ├─ text_encoders/
   │  ├─ clip_l.safetensors
   │  └─ t5xxl_fp8_e4m3fn.safetensors  # start with FP8 to save VRAM
   ├─ vae/
   │  └─ ae.safetensors                # Flux VAE
   └─ xlabs/
      └─ loras/
         └─ your_flux_lora.safetensors
  • GGUF sizes (Q4_0 ≈ 6.79 GB, Q2_K ≈ 4.03 GB) are on the model card; Q4_0 is the sweet spot for 8 GB. (Hugging Face)
  • VAE ae.safetensors is the standard Flux decoder; put it in models/vae/. (ComfyUI)

Launch in low-VRAM mode

REM Low-memory flags from x-flux guide
REM https://github.com/XLabs-AI/x-flux-comfyui
.\python_embeded\python.exe ComfyUI\main.py --lowvram --preview-method auto --use-split-cross-attention

Then, in your workflow, replace “Load Diffusion Model” with Unet Loader (GGUF) and pick the .gguf. (GitHub)

Minimal graph wiring (text→image)

  • Unet Loader (GGUF)flux1-dev-Q4_0.gguf.
  • DualCLIPLoaderclip_l.safetensors + t5xxl_fp8_e4m3fn.safetensors.
  • Load VAEae.safetensors.
  • Flux LoRA node → select your LoRA, start weight 0.8–1.0.
  • Sampler → try DPM++ 2M with SGM uniform, 22–26 steps, guidance 1.5–2.5, 640–704 px single image.
    Comfy’s official Flux tutorial shows file locations and loader nodes; use it to cross-check your UI. (ComfyUI)

If you see OOM or slow loads

  • Stay at 576–704 px.
  • Keep Q4_0. If still tight, use T5 GGUF via the GGUF loaders to reclaim VRAM.
  • The ComfyUI-GGUF readme explains that DiT models like Flux quantize well, and LoRA works with the built-in loader. (GitHub)

Why quant + offload is required

FLUX.1-dev is ~12B params. Full bf16 loads need about ~33 GB across the DiT + two text encoders. You must quantize on 8 GB. (Hugging Face)


Path 2 — Forge (AMD fork) with ZLUDA or DirectML

When to use: you want an A1111-style web UI instead of node graphs. This AMD fork exposes --use-zluda and --use-directml. Forge upstream documents Flux NF4/GGUF and LoRA support for Flux. (GitHub)

Install and start

REM AMD-ready Forge fork (pick one backend)
REM https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge
git clone https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge forge-amd
cd forge-amd

REM Prefer ZLUDA on recent AMD; fall back to DirectML if needed
start webui-user.bat --use-zluda
:: or:
:: start webui-user.bat --use-directml

Backend flags are listed in that fork’s README. (GitHub)

Put models in Forge folders

forge-amd/models/
├─ Stable-diffusion/   # flux1-dev-*.gguf
├─ VAE/                # ae.safetensors
├─ text_encoder/       # clip_l, t5xxl_* (fp8 is safer on 8 GB)
└─ Lora/               # your Flux LoRA

Forge’s readme states Flux BNF/NF4 and GGUF are supported, with LoRA support for these formats. Select UI mode: Flux, then choose checkpoint + VAE + text encoders in the UI. (GitHub)

Notes: ZLUDA can be faster than DirectML but is more finicky to set up; the fork recommends ZLUDA for new AMD cards. (GitHub)


Path 3 — sd.cpp desktop GUI (Vulkan, Python-free)

When to use: you want a very light install that works well on AMD. stable-diffusion.cpp supports Flux-dev/Flux-schnell, LoRA, and Vulkan backends. Several GUIs wrap it. (GitHub)

Steps

  1. Install an sd.cpp GUI and set Backend = Vulkan.
  2. Add flux1-dev-Q4_0.gguf, ae.safetensors, and your LoRA, then generate.
    The sd.cpp README lists Flux, LoRA, and Vulkan support explicitly. (GitHub)

Safe starting presets (portraits with your LoRA)

  • 640–704 px square, 22–26 steps, guidance 1.5–2.5, LoRA weight 0.8–1.0.
  • If likeness is weak, raise LoRA weight slowly. If artifacts appear, reduce weight or steps.
    Comfy’s Flux tutorial confirms component layout and FP8 text-encoder option for low VRAM. (ComfyUI)

Quick reference commands you can paste

:: 1) Install ComfyUI-GGUF
:: https://github.com/city96/ComfyUI-GGUF
git clone https://github.com/city96/ComfyUI-GGUF ComfyUI/custom_nodes/ComfyUI-GGUF
.\python_embeded\python.exe -s -m pip install -r .\ComfyUI\custom_nodes\ComfyUI-GGUF\requirements.txt
:: 2) Install x-flux-comfyui (creates .../xlabs/loras on first run)
:: https://github.com/XLabs-AI/x-flux-comfyui
git clone https://github.com/XLabs-AI/x-flux-comfyui ComfyUI/custom_nodes/x-flux-comfyui
.\python_embeded\python.exe .\ComfyUI\custom_nodes\x-flux-comfyui\setup.py
:: 3) Low-VRAM launch for 8 GB
:: https://github.com/XLabs-AI/x-flux-comfyui
.\python_embeded\python.exe ComfyUI\main.py --lowvram --preview-method auto --use-split-cross-attention
:: 4) Forge AMD fork (pick one backend)
:: https://github.com/lshqqytiger/stable-diffusion-webui-amdgpu-forge
start webui-user.bat --use-zluda
:: or
start webui-user.bat --use-directml

Licensing note

FLUX.1-dev is non-commercial. GGUF conversions inherit the same license; check before selling outputs. (Hugging Face)


Why this fits in 8 GB (background)

  • Full-precision Flux loads are ~33 GB bf16 across the DiT + text encoders.
  • Q4_0 GGUF (~6.79 GB) reduces the UNet footprint to fit with encoders + VAE on an 8 GB card, especially with low-VRAM flags or a GGUF T5. (Hugging Face)

Short, curated extras

Core docs and model cards

  • ComfyUI Flux tutorial: file locations, loaders, FP8 option. Useful to verify your node wiring. (ComfyUI)
  • FLUX.1-dev GGUF sizes and placement note (models/unet). Picks Q4_0 or Q2_K. (Hugging Face)

Add-ons

  • ComfyUI-GGUF readme: Unet Loader (GGUF), T5 GGUF, LoRA support note. Good for low-VRAM behavior. (GitHub)
  • x-flux-comfyui guide: LoRA folder path, low-memory launch flags. (GitHub)

Alternatives

  • Forge AMD fork: ZLUDA/DirectML flags; upstream Forge’s Flux + LoRA support claim. Use if you prefer A1111-style UI. (GitHub)
  • sd.cpp: Flux + LoRA + Vulkan. Use for a Python-free, AMD-friendly stack. (GitHub)

John6666, you are a lifesaver with these detailed posts. Are you AI?

I have gotten fooocus to work on my AMD 7600 machine with surprisingly good results. Image generation takes a while and there are issues with running out of GPU memory, however, I have made some really excellent pictures using SDXL.

However, do you happen to know how I can train an SDXL LoRa with my likeness? I want to be able to put myself in photos.

1 Like

Are you AI?

I mainly use AI as a search assistant, but I myself am just a “flesh-and-blood middle-aged guy.”:sweat_smile:

how I can train an SDXL LoRa with my likeness? I want to be able to put myself in photos.

Use one of well-known scripts.


You can train an SDXL LoRA of your own face on Windows with an RX 7600 today. The two workable paths are:

  • Windows-native: install ZLUDA to run CUDA-based trainers on AMD, then use a GUI trainer like OneTrainer or SD-Trainer. This is the fastest way to stay in Windows. (GitHub)
  • WSL2 fallback (more stable): run kohya_ss inside WSL2 with ROCm/PyTorch, then copy the LoRA to Fooocus. AMD now ships a PyTorch on Windows Preview, but training maturity varies, so WSL2 still wins on reliability. (GitHub)

Below is a beginner-friendly, step-by-step recipe that is detailed, redundant on purpose, and geared for 8 GB VRAM.


What you’re building

  • A LoRA file (.safetensors) trained on SDXL that acts like a “personal filter.”
  • You’ll trigger it with a unique token in your prompt (e.g., a photo of <mynick> person), and you’ll use it in Fooocus by placing it in Fooocus\models\loras. SDXL LoRA is required for SDXL; SD1.5 LoRA won’t work. (GitHub)

Step 0 — Choose your toolchain

Option A: Windows-native with ZLUDA + GUI (fastest to start)

  1. Install ZLUDA following the SD.Next wiki. This lets CUDA-expecting trainers run on AMD. (GitHub)

  2. Pick a GUI trainer:

    • OneTrainer → double-click start-ui.bat. Choose the SDXL LoRA template. Good defaults and logs. (GitHub)
    • SD-Trainer (Akegarasu) → GUI front-end that uses kohya’s trainer under the hood. (GitHub)
  3. If ZLUDA fails with a given GUI, try the other one. If both fail, use Option B.

Option B: WSL2 + ROCm + kohya_ss (most robust)

  • Install WSL2 Ubuntu, ROCm-enabled PyTorch, then kohya_ss (GUI included). Train in Linux userspace, copy the .safetensors to Windows. (GitHub)
  • AMD’s PyTorch on Windows Preview exists, but treat it as experimental for training on RX 7600. (AMD)

Step 1 — Collect your photos (the dataset)

Aim for 35–75 images. More variety beats sheer quantity.

Variety to include:

  • Angles: straight, 3/4, profile; tilt up/down.
  • Framing: close headshots, half-body, full-body.
  • Contexts: indoor, outdoor, day/night, different rooms and places.
  • Expressions: neutral, smile, serious; eyes open; avoid sunglasses.
  • Wardrobe: several tops, with/without hats, with/without glasses.
  • Backgrounds: mixed scenes so the model learns to keep you consistent, not the wall behind you.

Quality rules:

  • Sharp focus, no heavy filters, no duplicates, minimal obstructions.
  • Keep EXIF if convenient, but it’s not required.

Optional:

  • Regularization images (generic “person” photos) can reduce style drift. Impact is scenario-dependent; you can add them later if you see overfitting. (AboutMe)

Step 2 — Prepare the dataset

  1. Resolution: SDXL “native” is 1024 px, but 8 GB VRAM benefits from 896 px or 768 px with bucketing enabled to fit diverse aspect ratios. (GitHub)

  2. Crop obvious noise, keep your face reasonably large in frame.

  3. Captioning strategy:

    • Choose a unique token that does not collide with real words, e.g., sksmynick or zzbloraperson.

    • Use simple, explicit captions that always include your token and the class word “person”. Examples:

      • a photo of sksmynick person, wearing a black t-shirt, smiling, indoor
      • portrait of sksmynick person, outdoor, night, city lights, 3/4 view
    • You can auto-caption with BLIP or WD14 in kohya/SD-Trainer, then edit to add your token consistently. WD14/BLIP are built into kohya/SD-Trainer; issues and fixes are tracked in their repos. (GitHub)

    • If you prefer a third-party captioner, generate texts and drop .txt files next to each image. kohya supports caption templates and TOML datasets. (GitHub)

Folder idea:

dataset/
  person/                       # your images
    0001.jpg
    0001.txt  # "a photo of sksmynick person, close-up, smiling"
    ...
# optional regularization images
  reg_person/
    reg_0001.jpg
    reg_0001.txt  # "a photo of a person"

Step 3 — Training settings that fit 8 GB

These map cleanly to OneTrainer, SD-Trainer, or kohya_ss SDXL LoRA.

  • Base model: sd_xl_base_1.0 or a derivative you like.
  • Image size: start 896×896 or 768×768, Enable bucket.
  • Batch size: 1. Use gradient accumulation 4–8.
  • Precision: fp16 (or bf16 if supported and stable).
  • Train target: U-Net only first. Add text encoder later if identity is weak. sd-scripts tracks U-Net/Text-encoder controls for SDXL. (GitHub)
  • Network (LoRA): rank=8–16, alpha=rank.
  • Learning rate: start 5e-5. If faces mush, go 3e-5.
  • Steps: begin with 2–4 epochs over your set, or 2k–4k steps.
  • Shuffle images each epoch.
  • Disable heavy regularization at first. Add class images only if style bleed appears.
  • Save every N steps (e.g., 200–500) so you can A/B test.
  • Advanced knobs you can add later: block-wise rank/LR, masked-loss, min-SNR. sd-scripts supports block-wise rank/LR for SDXL LoRA. (GitHub)

Step 4 — Do it in your chosen tool

A) OneTrainer (GUI, Windows-native via ZLUDA)

  1. Start start-ui.bat. Choose the SDXL LoRA template. (GitHub)

  2. Point Dataset to your dataset/person and optional dataset/reg_person.

  3. Set Base to your SDXL checkpoint.

  4. Enter Token such as sksmynick.

  5. Use the 8 GB preset from above: 896 or 768, batch 1, accum 8, rank 8–16, LR 5e-5.

  6. Hit Train. Keep an eye on VRAM in logs. If OOM, drop to 768 or reduce rank to 8.

    • Community notes show this template flow works well for many users. (Reddit)

B) SD-Trainer (GUI, Windows-native via ZLUDA)

  • Similar flow. It’s a GUI wrapper around kohya’s trainer. Pick SDXL LoRA, set dataset, token, base, then train. Logs show steps and losses. (GitHub)

C) kohya_ss (GUI/CLI; best documented)

  • GUI exposes the same fields. If you prefer CLI, a safe starter command:
# SDXL LoRA starter (8 GB-friendly). Edit paths.
# Refs:
#   https://github.com/kohya-ss/sd-scripts           (sdxl_train_network.py)
#   https://github.com/bmaltais/kohya_ss             (kohya_ss GUI)
accelerate launch sdxl_train_network.py \
  --pretrained_model_name_or_path "X:\\models\\sd_xl_base_1.0.safetensors" \
  --train_data_dir "X:\\dataset\\person" \
  --output_dir "X:\\out_lora" \
  --resolution "896,896" --enable_bucket \
  --network_module "networks.lora" \
  --network_dim 16 --network_alpha 16 \
  --train_batch_size 1 --gradient_accumulation_steps 8 \
  --learning_rate 5e-5 --optimizer_type "adamw8bit" \
  --max_train_steps 3000 --save_every_n_steps 300 \
  --mixed_precision "fp16"
  • If you prefer TOML datasets with per-subset settings, use the sd-scripts dataset config format. (GitHub)

Step 5 — Test the LoRA in Fooocus

  1. Copy your .safetensors to Fooocus\models\loras. Restart or refresh. (GitHub)

  2. In Advanced → enable the LoRA and set weight 0.6–0.9 to start. Many LoRAs also need the trigger token in the positive prompt. (GitHub)

  3. Sample prompts:

    • Portrait:
      a photo of sksmynick person, studio portrait, shallow depth of field, 85mm, soft light
    • Full-body, outdoor:
      sksmynick person, full body, walking in a street market, golden hour, candid, 35mm photo
    • Cinematic:
      sksmynick person in a coffee shop, window light, film still, 4k, detailed background

If the likeness is weak, raise the LoRA weight slightly, or retrain a bit longer.


Troubleshooting on RX 7600 + Windows

  • OOM / Crashes: lower resolution to 768, set rank to 8, or raise gradient accumulation. Keep batch at 1. (GitHub)
  • Overfitting / “Same shirt everywhere”: add more varied outfits and backgrounds; add a few regularization images; lower LR to 3e-5. (AboutMe)
  • Captions look noisy: auto-caption first, then hand-edit to ensure every image includes your unique token + person. WD14/BLIP are common; issues and fixes are tracked in kohya repos. (GitHub)
  • Windows-native instability: if ZLUDA hiccups, switch trainer (OneTrainer ↔ SD-Trainer). If still flaky, WSL2 + kohya_ss is the escape hatch. (GitHub)
  • Preview drivers: AMD’s Windows PyTorch Preview is evolving. Expect changes. Prefer WSL2 for serious work today. (AMD)

Quick settings cheat-sheet (repeat, on purpose)

  • Images: 35–75. Diverse angles, outfits, places.
  • Token: something unique, used in every caption with person.
  • Size: start 896 → fallback 768. Bucket on.
  • Batch/Acc: 1 / 8.
  • Rank/Alpha: 8–16 / same as rank.
  • LR: 5e-53e-5 if mushy.
  • Steps: ~3k first pass. Save every 300.
  • Train U-Net only first. Consider text-encoder later. (GitHub)
  • Export: .safetensorsFooocus\models\loras. Use weight 0.6–0.9. (GitHub)

Curated references

Core trainers and docs

  • sd-scripts (SDXL LoRA features, masked loss, block-wise rank/LR). (GitHub)
  • kohya_ss GUI (official GUI for sd-scripts). (GitHub)
  • OneTrainer (GUI with SDXL LoRA template). (GitHub)
  • SD-Trainer (Akegarasu) (GUI using kohya trainer). (GitHub)

Windows on AMD specifics

  • ZLUDA setup (run CUDA tools on AMD in Windows). (GitHub)
  • AMD PyTorch on Windows Preview (current state). (AMD)

Fooocus usage

  • Add LoRAs to models\loras, manage weights and triggers. (GitHub)

Captioning and method tips

  • Caption templates, TOML datasets in sd-scripts. (GitHub)
  • Practical OneTrainer SDXL notes and caption comparisons. (Reddit)

Thanks. I am running Stability Matrix. I am starting with OneTrainer. If that doesn’t work out I’ll try the others. Kohya never really worked with me.

1 Like