Need help on training LoRA model

Iโ€™m trying to train LoRA with human faces then create photo with existing txt2img models.

Environment

  • AWS g4dn.xlarge instance (T4 GPU, 16GB vRAM)
  • kohya_ss master branch
  • downloaded 24 images online, cropped and keep only faces

Training data and results models

Training steps

I first use BLIP to caption the images, and use img, as prefix / trigger word.

enter image description here

And confirm the caption is succeed:

root@ip-xxxx:/data/crop-test/img/50_img woman# strings *txt
img, a woman with long hair and a black dress
img, a woman with long hair and earrings posing for a picture
img, a woman with long hair and earrings posing for a picture
img, a woman with a big smile on her face
img, a close up of a woman with red lipstick
img, a woman with long hair and a black dress
img, a woman with long hair and a black dress
img, a woman with long hair and a red lipstick
img, a woman with long blonde hair and red lipstick
img, a woman with long hair and a necklace
...

Then I started the training process

  1. Training โ†’ Models โ†’ Pretrained model name or path๏ผŒI picked stabilityai/stable-diffusion-xl-base-1.0
  2. Folder โ†’ Output directory for trained model๏ผŒI picked /data/crop-test/model
  3. Folder โ†’ Image folder (containing training images subfolders)๏ผŒI picked /data/crop-test/img
  4. Parameters โ†’ Basic, check No half VAE

enter image description here

The training process is successful:

(venv) root@ip-172-31-30-185:/home/kohya_ss/package# ./gui.sh --share
Warning: LD_LIBRARY_PATH environment variable is not set.
Certain functionalities may not work correctly.
Please ensure that the required libraries are properly configured.

If you use WSL2 you may want to: export LD_LIBRARY_PATH=/usr/lib/wsl/lib/

12:05:26-185279 INFO     Kohya_ss GUI version: v24.1.7
12:05:26-258257 INFO     Submodule initialized and updated.
12:05:26-259637 INFO     nVidia toolkit detected
12:05:28-843697 INFO     Torch 2.1.2+cu118
12:05:28-988088 INFO     Torch backend: nVidia CUDA 11.8 cuDNN 8700
12:05:29-018737 INFO     Torch detected GPU: Tesla T4 VRAM 14918 Arch (7, 5) Cores 40
12:05:29-025589 INFO     Python version is 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
12:05:29-027176 INFO     Verifying modules installation status from /home/kohya_ss/package/requirements_linux.txt...
12:05:29-031445 INFO     Verifying modules installation status from requirements.txt...
2025-01-25 12:05:33.860689: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:05:33.911873: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-25 12:05:33.911921: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-25 12:05:33.913381: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-25 12:05:33.927310: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:05:33.929022: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-25 12:05:35.732607: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
12:05:40-607170 INFO     headless: False
12:05:40-691975 INFO     Using shell=True when running external commands...
/home/kohya_ss/package/venv/lib/python3.10/site-packages/gradio/analytics.py:106: UserWarning: IMPORTANT: You are using gradio version 4.43.0, however version 4.44.1 is available, please upgrade.
--------
  warnings.warn(
Running on local URL:  http://127.0.0.1:7860
Running on public URL: https://xxxx.gradio.live

This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
12:05:57-434344 INFO     SDXL model selected. Setting sdxl parameters
12:06:02-306668 INFO     SDXL model selected. Setting sdxl parameters
12:06:15-520353 INFO     Start training LoRA Standard ...
12:06:15-521888 INFO     Validating lr scheduler arguments...
12:06:15-523131 INFO     Validating optimizer arguments...
12:06:15-524087 INFO     Validating /data/crop-test/log existence and writability... SUCCESS
12:06:15-525141 INFO     Validating /data/crop-test/model existence and writability... SUCCESS
12:06:15-526304 INFO     Validating stabilityai/stable-diffusion-xl-base-1.0 existence... SUCCESS
12:06:15-527309 INFO     Validating /data/crop-test/img existence... SUCCESS
12:06:15-528331 INFO     Folder 50_img woman: 50 repeats found
12:06:15-529470 INFO     Folder 50_img woman: 24 images found
12:06:15-530403 INFO     Folder 50_img woman: 24 * 50 = 1200 steps
12:06:15-531361 INFO     Regulatization factor: 1
12:06:15-532252 INFO     Total steps: 1200
12:06:15-533133 INFO     Train batch size: 1
12:06:15-534001 INFO     Gradient accumulation steps: 1
12:06:15-534896 INFO     Epoch: 1
12:06:15-535786 INFO     Max train steps: 1600
12:06:15-536643 INFO     stop_text_encoder_training = 0
12:06:15-537544 INFO     lr_warmup_steps = 160
12:06:15-539077 INFO     Saving training config to /data/crop-test/model/last_20250125-120615.json...
12:06:15-540462 INFO     Executing command: /home/kohya_ss/package/venv/bin/accelerate launch --dynamo_backend no --dynamo_mode default --mixed_precision fp16 --num_processes 1
                         --num_machines 1 --num_cpu_threads_per_process 2 /home/kohya_ss/package/sd-scripts/sdxl_train_network.py --config_file
                         /data/crop-test/model/config_lora-20250125-120615.toml
12:06:15-544535 INFO     Command executed.
2025-01-25 12:06:24.777838: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:06:24.823094: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-25 12:06:24.823142: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-25 12:06:24.824557: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-25 12:06:24.831922: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:06:24.832176: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-25 12:06:25.870362: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-01-25 12:06:27 INFO     Loading settings from /data/crop-test/model/config_lora-20250125-120615.toml...                                                        train_util.py:4174
                    INFO     /data/crop-test/model/config_lora-20250125-120615                                                                                      train_util.py:4193
2025-01-25 12:06:27 INFO     prepare tokenizers                                                                                                                 sdxl_train_util.py:138
                    INFO     update token length: 75                                                                                                            sdxl_train_util.py:163
                    INFO     Using DreamBooth method.                                                                                                             train_network.py:172
2025-01-25 12:06:28 INFO     prepare images.                                                                                                                        train_util.py:1815
                    INFO     found directory /data/crop-test/img/50_img woman contains 24 image files                                                               train_util.py:1762
                    INFO     1200 train images with repeating.                                                                                                      train_util.py:1856
                    INFO     0 reg images.                                                                                                                          train_util.py:1859
                    WARNING  no regularization images / ๆญฃๅ‰‡ๅŒ–็”ปๅƒใŒ่ฆ‹ใคใ‹ใ‚Šใพใ›ใ‚“ใงใ—ใŸ                                                                            train_util.py:1864
                    INFO     [Dataset 0]                                                                                                                            config_util.py:572
                               batch_size: 1
                               resolution: (512, 512)
                               enable_bucket: True
                               network_multiplier: 1.0
                               min_bucket_reso: 256
                               max_bucket_reso: 2048
                               bucket_reso_steps: 64
                               bucket_no_upscale: True

                               [Subset 0 of Dataset 0]
                                 image_dir: "/data/crop-test/img/50_img woman"
                                 image_count: 24
                                 num_repeats: 50
                                 shuffle_caption: False
                                 keep_tokens: 0
                                 keep_tokens_separator:
                                 caption_separator: ,
                                 secondary_separator: None
                                 enable_wildcard: False
                                 caption_dropout_rate: 0.0
                                 caption_dropout_every_n_epoches: 0
                                 caption_tag_dropout_rate: 0.0
                                 caption_prefix: None
                                 caption_suffix: None
                                 color_aug: False
                                 flip_aug: False
                                 face_crop_aug_range: None
                                 random_crop: False
                                 token_warmup_min: 1,
                                 token_warmup_step: 0,
                                 alpha_mask: False,
                                 is_reg: False
                                 class_tokens: img woman
                                 caption_extension: .txt


                    INFO     [Dataset 0]                                                                                                                            config_util.py:578
                    INFO     loading image sizes.                                                                                                                    train_util.py:911
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 24/24 [00:00<00:00, 4340.24it/s]
                    INFO     make buckets                                                                                                                            train_util.py:917
                    WARNING  min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically train_util.py:934
                             / bucket_no_upscaleใŒๆŒ‡ๅฎšใ•ใ‚ŒใŸๅ ดๅˆใฏใ€bucketใฎ่งฃๅƒๅบฆใฏ็”ปๅƒใ‚ตใ‚คใ‚บใ‹ใ‚‰่‡ชๅ‹•่จˆ็ฎ—ใ•ใ‚Œใ‚‹ใŸใ‚ใ€min_bucket_resoใจmax_bucket_resoใฏ็„ก่ฆ–ใ•ใ‚Œใพใ™
                    INFO     number of images (including repeats) / ๅ„bucketใฎ็”ปๅƒๆžšๆ•ฐ๏ผˆ็นฐใ‚Š่ฟ”ใ—ๅ›žๆ•ฐใ‚’ๅซใ‚€๏ผ‰                                                         train_util.py:963
                    INFO     bucket 0: resolution (256, 256), count: 150                                                                                             train_util.py:968
                    INFO     bucket 1: resolution (320, 320), count: 50                                                                                              train_util.py:968
                    INFO     bucket 2: resolution (320, 384), count: 50                                                                                              train_util.py:968
                    INFO     bucket 3: resolution (384, 384), count: 100                                                                                             train_util.py:968
                    INFO     bucket 4: resolution (384, 448), count: 50                                                                                              train_util.py:968
                    INFO     bucket 5: resolution (448, 384), count: 50                                                                                              train_util.py:968
                    INFO     bucket 6: resolution (448, 448), count: 300                                                                                             train_util.py:968
                    INFO     bucket 7: resolution (448, 512), count: 200                                                                                             train_util.py:968
                    INFO     bucket 8: resolution (448, 576), count: 50                                                                                              train_util.py:968
                    INFO     bucket 9: resolution (512, 448), count: 150                                                                                             train_util.py:968
                    INFO     bucket 10: resolution (512, 512), count: 50                                                                                             train_util.py:968
                    INFO     mean ar error (without repeats): 0.04031468409762353                                                                                    train_util.py:973
                    WARNING  clip_skip will be unexpected / SDXLๅญฆ็ฟ’ใงใฏclip_skipใฏๅ‹•ไฝœใ—ใพใ›ใ‚“                                                                 sdxl_train_util.py:352
                    INFO     preparing accelerator                                                                                                                train_network.py:225
accelerator device: cuda
                    INFO     loading model for process 0/1                                                                                                       sdxl_train_util.py:33
                    INFO     load Diffusers pretrained models: stabilityai/stable-diffusion-xl-base-1.0, variant=fp16                                            sdxl_train_util.py:88
Loading pipeline components...: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 6/6 [00:49<00:00,  8.20s/it]
2025-01-25 12:07:17 INFO     U-Net converted to original U-Net                                                                                                  sdxl_train_util.py:124
2025-01-25 12:07:18 INFO     Enable xformers for U-Net                                                                                                              train_util.py:3040
import network module: networks.lora
2025-01-25 12:07:21 INFO     [Dataset 0]                                                                                                                            train_util.py:2323
                    INFO     caching latents.                                                                                                                       train_util.py:1095
                    INFO     checking cache validity...                                                                                                             train_util.py:1105
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 24/24 [00:00<00:00, 299593.14it/s]
                    INFO     caching latents...                                                                                                                     train_util.py:1144
  0%|                                                                                                                                                          | 0/24 [00:00<?, ?it/s]/home/kohya_ss/package/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
  return F.conv2d(input, weight, bias, self.stride,
100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 24/24 [00:10<00:00,  2.28it/s]
2025-01-25 12:07:42 INFO     create LoRA network. base dim (rank): 8, alpha: 1                                                                                             lora.py:928
                    INFO     neuron dropout: p=None, rank dropout: p=None, module dropout: p=None                                                                          lora.py:929
                    INFO     create LoRA for Text Encoder 1:                                                                                                              lora.py:1020
                    INFO     create LoRA for Text Encoder 2:                                                                                                              lora.py:1020
                    INFO     create LoRA for Text Encoder: 264 modules.                                                                                                   lora.py:1028
2025-01-25 12:07:43 INFO     create LoRA for U-Net: 722 modules.                                                                                                          lora.py:1036
                    INFO     enable LoRA for text encoder: 264 modules                                                                                                    lora.py:1077
                    INFO     enable LoRA for U-Net: 722 modules                                                                                                           lora.py:1082
prepare optimizer, data loader etc.
                    INFO     use 8-bit AdamW optimizer | {}                                                                                                         train_util.py:4327
running training / ๅญฆ็ฟ’้–‹ๅง‹
  num train images * repeats / ๅญฆ็ฟ’็”ปๅƒใฎๆ•ฐร—็นฐใ‚Š่ฟ”ใ—ๅ›žๆ•ฐ: 1200
  num reg images / ๆญฃๅ‰‡ๅŒ–็”ปๅƒใฎๆ•ฐ: 0
  num batches per epoch / 1epochใฎใƒใƒƒใƒๆ•ฐ: 1200
  num epochs / epochๆ•ฐ: 2
  batch size per device / ใƒใƒƒใƒใ‚ตใ‚คใ‚บ: 1
  gradient accumulation steps / ๅ‹พ้…ใ‚’ๅˆ่จˆใ™ใ‚‹ใ‚นใƒ†ใƒƒใƒ—ๆ•ฐ = 1
  total optimization steps / ๅญฆ็ฟ’ใ‚นใƒ†ใƒƒใƒ—ๆ•ฐ: 1600
steps:   0%|                                                                                                                                                 | 0/1600 [00:00<?, ?it/s]
epoch 1/2
2025-01-25 12:07:53 INFO     epoch is incremented. current_epoch: 0, epoch: 1                                                                                        train_util.py:703
steps:  75%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–Œ                             | 1200/1600 [22:50<07:36,  1.14s/it, avr_loss=0.136]
saving checkpoint: /data/crop-test/model/last-000001.safetensors

epoch 2/2
2025-01-25 12:30:44 INFO     epoch is incremented. current_epoch: 1, epoch: 2                                                                                        train_util.py:703
steps: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1600/1600 [30:20<00:00,  1.14s/it, avr_loss=0.129]
saving checkpoint: /data/crop-test/model/last.safetensors
2025-01-25 12:38:15 INFO     model saved.                                                                                                                        train_network.py:1104
steps: 100%|โ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆโ–ˆ| 1600/1600 [30:21<00:00,  1.14s/it, avr_loss=0.129]
12:38:23-657178 INFO     Training has ended.

Generation steps

I first copied /data/crop-test/models/last.safetensors to /data. Then I create a pipeline script to create outputs with trigger word img

#!/usr/bin/env python3.10

import torch
from diffusers import DiffusionPipeline

pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float32)
pipe.load_lora_weights("last.safetensors", adapter_name="faceTest", weight_name="faceTest")
pipe.set_adapters(['faceTest'], adapter_weights=[0.75])
pipe.to('cuda')
pipe.fuse_lora(lora_scale=0.75)
pipe.enable_attention_slicing()

prompt = "cinematic photo of jennlaw, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"
negative_prompt = "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

images = pipe(prompt, 
             negative_prompt=negative_prompt, 
             guidance_scale=5,
             num_inference_steps=50,
             width=768,
             height=1024,
             num_images_per_prompt=2).images

for idx, img in enumerate(images): 
    img.save(f"outputs/{idx}.jpg")

After a while the results is generated. But itโ€™s nothing like the inputs.

These are the the inputs,

enter image description here

These are the outputs,

enter image description here

I have completely no idea whatโ€™s wrong, canโ€™t find anything useful online, chatgpt didnโ€™t help either.

Does anyone know whatโ€™s wrong?

1 Like

The prompt may not contain the trigger word.

#prompt = "cinematic photo of jennlaw, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"
prompt = "img, cinematic photo of jennlaw, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"

Can you take a look here? Need help on training LoRA model ยท bmaltais/kohya_ss ยท Discussion #3063 ยท GitHub
I did multiple improvements, but still didnโ€™t resolve this issue โ€ฆ

1 Like

I wonder. I tried it with the following code, but maybe LoRA is not very effective. However, when using KohyaSS, it seems to be quite effective with the default valuesโ€ฆ

import torch
from diffusers import DiffusionPipeline, AutoencoderKL

scale = 1.0
device = "cuda"
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16)
pipe.vae = AutoencoderKL.from_pretrained("madebyollin/sdxl-vae-fp16-fix", torch_dtype=torch.float16)
pipe.load_lora_weights("last.safetensors", adapter_name="faceTest", weight_name="faceTest")
pipe.set_adapters(['faceTest'], adapter_weights=[scale])
pipe.to(device)
pipe.fuse_lora(lora_scale=scale)
#pipe.enable_attention_slicing()

prompt = "img, cinematic photo of a woman img, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"
negative_prompt = "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"

images = pipe(prompt, 
             negative_prompt=negative_prompt, 
             guidance_scale=5,
             num_inference_steps=50,
             width=768,
             height=1024,
             num_images_per_prompt=1).images

for idx, img in enumerate(images): 
    img.save(f"outputs/{idx}.jpg")

Maybe start with ComfyUI first, itโ€™s easier. See Need help on training LoRA model ยท bmaltais/kohya_ss ยท Discussion #3063 ยท GitHub

I made a few tests, the generated image looked a little bit like the input ID image, but the result is still quite bad, not sure why.

1 Like

Itโ€™s resolved. Changed the base model and modified several sd-scripts paramters. Now it worked. Moving on.

1 Like