Need help on training LoRA model

oddball516 · January 26, 2025, 10:30am

I’m trying to train LoRA with human faces then create photo with existing txt2img models.

Environment

AWS g4dn.xlarge instance (T4 GPU, 16GB vRAM)
kohya_ss master branch
downloaded 24 images online, cropped and keep only faces

Training data and results models

Download crop-test-done.tar.gz from Upload Files | Free File Upload and Transfer Up To 10 GB

Training steps

I first use BLIP to caption the images, and use img, as prefix / trigger word.

And confirm the caption is succeed:

root@ip-xxxx:/data/crop-test/img/50_img woman# strings *txt
img, a woman with long hair and a black dress
img, a woman with long hair and earrings posing for a picture
img, a woman with long hair and earrings posing for a picture
img, a woman with a big smile on her face
img, a close up of a woman with red lipstick
img, a woman with long hair and a black dress
img, a woman with long hair and a black dress
img, a woman with long hair and a red lipstick
img, a woman with long blonde hair and red lipstick
img, a woman with long hair and a necklace
...

Then I started the training process

Training → Models → Pretrained model name or path，I picked stabilityai/stable-diffusion-xl-base-1.0
Folder → Output directory for trained model，I picked /data/crop-test/model
Folder → Image folder (containing training images subfolders)，I picked /data/crop-test/img
Parameters → Basic, check No half VAE

The training process is successful:

(venv) root@ip-172-31-30-185:/home/kohya_ss/package# ./gui.sh --share
Warning: LD_LIBRARY_PATH environment variable is not set.
Certain functionalities may not work correctly.
Please ensure that the required libraries are properly configured.

If you use WSL2 you may want to: export LD_LIBRARY_PATH=/usr/lib/wsl/lib/

12:05:26-185279 INFO     Kohya_ss GUI version: v24.1.7
12:05:26-258257 INFO     Submodule initialized and updated.
12:05:26-259637 INFO     nVidia toolkit detected
12:05:28-843697 INFO     Torch 2.1.2+cu118
12:05:28-988088 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700
12:05:29-018737 INFO     Torch detected GPU: Tesla T4 VRAM 14918 Arch (7, 5) Cores 40
12:05:29-025589 INFO     Python version is 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
12:05:29-027176 INFO     Verifying modules installation status from /home/kohya_ss/package/requirements_linux.txt...
12:05:29-031445 INFO     Verifying modules installation status from requirements.txt...
2025-01-25 12:05:33.860689: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:05:33.911873: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-25 12:05:33.911921: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-25 12:05:33.913381: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-25 12:05:33.927310: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:05:33.929022: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-25 12:05:35.732607: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
12:05:40-607170 INFO     headless: False
12:05:40-691975 INFO     Using shell=True when running external commands...
/home/kohya_ss/package/venv/lib/python3.10/site-packages/gradio/analytics.py:106: UserWarning: IMPORTANT: You are using gradio version 4.43.0, however version 4.44.1 is available, please upgrade.
--------
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://xxxx.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
12:05:57-434344 INFO     SDXL model selected. Setting sdxl parameters
12:06:02-306668 INFO     SDXL model selected. Setting sdxl parameters
12:06:15-520353 INFO     Start training LoRA Standard ...
12:06:15-521888 INFO     Validating lr scheduler arguments...
12:06:15-523131 INFO     Validating optimizer arguments...
12:06:15-524087 INFO     Validating /data/crop-test/log existence and writability... SUCCESS
12:06:15-525141 INFO     Validating /data/crop-test/model existence and writability... SUCCESS
12:06:15-526304 INFO     Validating stabilityai/stable-diffusion-xl-base-1.0 existence... SUCCESS
12:06:15-527309 INFO     Validating /data/crop-test/img existence... SUCCESS
12:06:15-528331 INFO     Folder 50_img woman: 50 repeats found
12:06:15-529470 INFO     Folder 50_img woman: 24 images found
12:06:15-530403 INFO     Folder 50_img woman: 24 * 50 = 1200 steps
12:06:15-531361 INFO     Regulatization factor: 1
12:06:15-532252 INFO     Total steps: 1200
12:06:15-533133 INFO     Train batch size: 1
12:06:15-534001 INFO     Gradient accumulation steps: 1
12:06:15-534896 INFO     Epoch: 1
12:06:15-535786 INFO     Max train steps: 1600
12:06:15-536643 INFO     stop_text_encoder_training = 0
12:06:15-537544 INFO     lr_warmup_steps = 160
12:06:15-539077 INFO     Saving training config to /data/crop-test/model/last_20250125-120615.json...
12:06:15-540462 INFO     Executing command: /home/kohya_ss/package/venv/bin/accelerate launch --dynamo_backend no --dynamo_mode default --mixed_precision fp16 --num_processes 1
                         --num_machines 1 --num_cpu_threads_per_process 2 /home/kohya_ss/package/sd-scripts/sdxl_train_network.py --config_file
                         /data/crop-test/model/config_lora-20250125-120615.toml
12:06:15-544535 INFO     Command executed.
2025-01-25 12:06:24.777838: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:06:24.823094: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-25 12:06:24.823142: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-25 12:06:24.824557: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-25 12:06:24.831922: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:06:24.832176: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-25 12:06:25.870362: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-01-25 12:06:27 INFO     Loading settings from /data/crop-test/model/config_lora-20250125-120615.toml...                                                        train_util.py:4174
                    INFO     /data/crop-test/model/config_lora-20250125-120615                                                                                      train_util.py:4193
2025-01-25 12:06:27 INFO     prepare tokenizers                                                                                                                 sdxl_train_util.py:138
                    INFO     update token length: 75                                                                                                            sdxl_train_util.py:163
                    INFO     Using DreamBooth method.                                                                                                             train_network.py:172
2025-01-25 12:06:28 INFO     prepare images.                                                                                                                        train_util.py:1815
                    INFO     found directory /data/crop-test/img/50_img woman contains 24 image files                                                               train_util.py:1762
                    INFO     1200 train images with repeating.                                                                                                      train_util.py:1856
                    INFO     0 reg images.                                                                                                                          train_util.py:1859
                    WARNING  no regularization images / 正則化画像が見つかりませんでした                                                                            train_util.py:1864
                    INFO     [Dataset 0]                                                                                                                            config_util.py:572
                               batch_size: 1
                               resolution: (512, 512)
                               enable_bucket: True
                               network_multiplier: 1.0
                               min_bucket_reso: 256
                               max_bucket_reso: 2048
                               bucket_reso_steps: 64
                               bucket_no_upscale: True

                               [Subset 0 of Dataset 0]
                                 image_dir: "/data/crop-test/img/50_img woman"
                                 image_count: 24
                                 num_repeats: 50
                                 shuffle_caption: False
                                 keep_tokens: 0
                                 keep_tokens_separator:
                                 caption_separator: ,
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0.0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: False
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 alpha_mask: False,
                                 is_reg: False
                                 class_tokens: img woman
                                 caption_extension: .txt


                    INFO     [Dataset 0]                                                                                                                            config_util.py:578
                    INFO     loading image sizes.                                                                                                                    train_util.py:911
100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 4340.24it/s]
                    INFO     make buckets                                                                                                                            train_util.py:917
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically train_util.py:934
                             / bucket_no_upscaleが指定された場合は、bucketの解像度は画像サイズから自動計算されるため、min_bucket_resoとmax_bucket_resoは無視されます
                    INFO     number of images (including repeats) / 各bucketの画像枚数（繰り返し回数を含む）                                                         train_util.py:963
                    INFO     bucket 0: resolution (256, 256), count: 150                                                                                             train_util.py:968
                    INFO     bucket 1: resolution (320, 320), count: 50                                                                                              train_util.py:968
                    INFO     bucket 2: resolution (320, 384), count: 50                                                                                              train_util.py:968
                    INFO     bucket 3: resolution (384, 384), count: 100                                                                                             train_util.py:968
                    INFO     bucket 4: resolution (384, 448), count: 50                                                                                              train_util.py:968
                    INFO     bucket 5: resolution (448, 384), count: 50                                                                                              train_util.py:968
                    INFO     bucket 6: resolution (448, 448), count: 300                                                                                             train_util.py:968
                    INFO     bucket 7: resolution (448, 512), count: 200                                                                                             train_util.py:968
                    INFO     bucket 8: resolution (448, 576), count: 50                                                                                              train_util.py:968
                    INFO     bucket 9: resolution (512, 448), count: 150                                                                                             train_util.py:968
                    INFO     bucket 10: resolution (512, 512), count: 50                                                                                             train_util.py:968
                    INFO     mean ar error (without repeats): 0.04031468409762353                                                                                    train_util.py:973
                    WARNING  clip_skip will be unexpected / SDXL学習ではclip_skipは動作しません                                                                 sdxl_train_util.py:352
                    INFO     preparing accelerator                                                                                                                train_network.py:225
accelerator device: cuda
                    INFO     loading model for process 0/1                                                                                                       sdxl_train_util.py:33
                    INFO     load Diffusers pretrained models: stabilityai/stable-diffusion-xl-base-1.0, variant=fp16                                            sdxl_train_util.py:88
Loading pipeline components...: 100%|███████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 6/6 [00:49<00:00,  8.20s/it]
2025-01-25 12:07:17 INFO     U-Net converted to original U-Net                                                                                                  sdxl_train_util.py:124
2025-01-25 12:07:18 INFO     Enable xformers for U-Net                                                                                                              train_util.py:3040
import network module: networks.lora
2025-01-25 12:07:21 INFO     [Dataset 0]                                                                                                                            train_util.py:2323
                    INFO     caching latents.                                                                                                                       train_util.py:1095
                    INFO     checking cache validity...                                                                                                             train_util.py:1105
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:00<00:00, 299593.14it/s]
                    INFO     caching latents...                                                                                                                     train_util.py:1144
  0%|                                                                                                                                                          | 0/24 [00:00<?, ?it/s]/home/kohya_ss/package/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
  return F.conv2d(input, weight, bias, self.stride,
100%|█████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 24/24 [00:10<00:00,  2.28it/s]
2025-01-25 12:07:42 INFO     create LoRA network. base dim (rank): 8, alpha: 1                                                                                             lora.py:928
                    INFO     neuron dropout: p=None, rank dropout: p=None, module dropout: p=None                                                                          lora.py:929
                    INFO     create LoRA for Text Encoder 1:                                                                                                              lora.py:1020
                    INFO     create LoRA for Text Encoder 2:                                                                                                              lora.py:1020
                    INFO     create LoRA for Text Encoder: 264 modules.                                                                                                   lora.py:1028
2025-01-25 12:07:43 INFO     create LoRA for U-Net: 722 modules.                                                                                                          lora.py:1036
                    INFO     enable LoRA for text encoder: 264 modules                                                                                                    lora.py:1077
                    INFO     enable LoRA for U-Net: 722 modules                                                                                                           lora.py:1082
prepare optimizer, data loader etc.
                    INFO     use 8-bit AdamW optimizer | {}                                                                                                         train_util.py:4327
running training / 学習開始
  num train images * repeats / 学習画像の数×繰り返し回数: 1200
  num reg images / 正則化画像の数: 0
  num batches per epoch / 1epochのバッチ数: 1200
  num epochs / epoch数: 2
  batch size per device / バッチサイズ: 1
  gradient accumulation steps / 勾配を合計するステップ数 = 1
  total optimization steps / 学習ステップ数: 1600
steps:   0%|                                                                                                                                                 | 0/1600 [00:00<?, ?it/s]
epoch 1/2
2025-01-25 12:07:53 INFO     epoch is incremented. current_epoch: 0, epoch: 1                                                                                        train_util.py:703
steps:  75%|████████████████████████████████████████████████████████████████████████████████████████▌                             | 1200/1600 [22:50<07:36,  1.14s/it, avr_loss=0.136]
saving checkpoint: /data/crop-test/model/last-000001.safetensors

epoch 2/2
2025-01-25 12:30:44 INFO     epoch is incremented. current_epoch: 1, epoch: 2                                                                                        train_util.py:703
steps: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1600/1600 [30:20<00:00,  1.14s/it, avr_loss=0.129]
saving checkpoint: /data/crop-test/model/last.safetensors
2025-01-25 12:38:15 INFO     model saved.                                                                                                                        train_network.py:1104
steps: 100%|██████████████████████████████████████████████████████████████████████████████████████████████████████████████████████| 1600/1600 [30:21<00:00,  1.14s/it, avr_loss=0.129]
12:38:23-657178 INFO     Training has ended.

Generation steps

I first copied /data/crop-test/models/last.safetensors to /data. Then I create a pipeline script to create outputs with trigger word img

#!/usr/bin/env python3.10

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float32)
pipe.load_lora_weights("last.safetensors", adapter_name="faceTest", weight_name="faceTest")
pipe.set_adapters(['faceTest'], adapter_weights=[0.75])
pipe.to('cuda')
pipe.fuse_lora(lora_scale=0.75)
pipe.enable_attention_slicing()

prompt = "cinematic photo of jennlaw, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"
negative_prompt = "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

images = pipe(prompt, 
             negative_prompt=negative_prompt, 
             guidance_scale=5,
             num_inference_steps=50,
             width=768,
             height=1024,
             num_images_per_prompt=2).images

for idx, img in enumerate(images): 
    img.save(f"outputs/{idx}.jpg")

After a while the results is generated. But it’s nothing like the inputs.

These are the the inputs,

These are the outputs,

I have completely no idea what’s wrong, can’t find anything useful online, chatgpt didn’t help either.

Does anyone know what’s wrong?

John6666 · January 26, 2025, 11:53am

The prompt may not contain the trigger word.

#prompt = "cinematic photo of jennlaw, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"
prompt = "img, cinematic photo of jennlaw, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"

oddball516 · January 29, 2025, 7:51am

Can you take a look here? Need help on training LoRA model · bmaltais/kohya_ss · Discussion #3063 · GitHub
I did multiple improvements, but still didn’t resolve this issue …

John6666 · January 29, 2025, 8:33am

I wonder. I tried it with the following code, but maybe LoRA is not very effective. However, when using KohyaSS, it seems to be quite effective with the default values…

import torch
from diffusers import DiffusionPipeline, AutoencoderKL

scale = 1.0
device = "cuda"
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe.load_lora_weights("last.safetensors", adapter_name="faceTest", weight_name="faceTest")
pipe.set_adapters(['faceTest'], adapter_weights=[scale])
pipe.to(device)
pipe.fuse_lora(lora_scale=scale)
#pipe.enable_attention_slicing()

prompt = "img, cinematic photo of a woman img, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"
negative_prompt = "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

images = pipe(prompt, 
             negative_prompt=negative_prompt, 
             guidance_scale=5,
             num_inference_steps=50,
             width=768,
             height=1024,
             num_images_per_prompt=1).images

for idx, img in enumerate(images): 
    img.save(f"outputs/{idx}.jpg")

oddball516 · January 30, 2025, 6:49am

Maybe start with ComfyUI first, it’s easier. See Need help on training LoRA model · bmaltais/kohya_ss · Discussion #3063 · GitHub

I made a few tests, the generated image looked a little bit like the input ID image, but the result is still quite bad, not sure why.

oddball516 · January 30, 2025, 3:42pm

It’s resolved. Changed the base model and modified several sd-scripts paramters. Now it worked. Moving on.

Topic		Replies	Views
Error while training LORA in KOHYA_SS (stabilityai/stable-diffusion-xl-base-1.0) Beginners	21	1228	February 13, 2025
Additional training of models Beginners	1	147	October 5, 2024
Al-Toolkit Guidance required: Training LORA for realistic full-body fashion model portraits Models	0	1288	November 27, 2024
Creation of Images from Text-Prompt (Customized Training) Beginners	37	513	January 15, 2025
Wrong hair on training lora Models	1	66	April 3, 2025

Need help on training LoRA model

Environment

Training data and results models

Training steps

Generation steps

Related topics