Iโm trying to train LoRA with human faces then create photo with existing txt2img models.
Environment
- AWS g4dn.xlarge instance (T4 GPU, 16GB vRAM)
- kohya_ss master branch
- downloaded 24 images online, cropped and keep only faces
Training data and results models
- Download crop-test-done.tar.gz from Upload Files | Free File Upload and Transfer Up To 10 GB
Training steps
I first use BLIP to caption the images, and use img,
as prefix / trigger word.
And confirm the caption is succeed:
root@ip-xxxx:/data/crop-test/img/50_img woman# strings *txt
img, a woman with long hair and a black dress
img, a woman with long hair and earrings posing for a picture
img, a woman with long hair and earrings posing for a picture
img, a woman with a big smile on her face
img, a close up of a woman with red lipstick
img, a woman with long hair and a black dress
img, a woman with long hair and a black dress
img, a woman with long hair and a red lipstick
img, a woman with long blonde hair and red lipstick
img, a woman with long hair and a necklace
...
Then I started the training process
- Training โ Models โ Pretrained model name or path๏ผI picked
stabilityai/stable-diffusion-xl-base-1.0
- Folder โ Output directory for trained model๏ผI picked
/data/crop-test/model
- Folder โ Image folder (containing training images subfolders)๏ผI picked
/data/crop-test/img
- Parameters โ Basic, check
No half VAE
The training process is successful:
(venv) root@ip-172-31-30-185:/home/kohya_ss/package# ./gui.sh --share
Warning: LD_LIBRARY_PATH environment variable is not set.
Certain functionalities may not work correctly.
Please ensure that the required libraries are properly configured.
If you use WSL2 you may want to: export LD_LIBRARY_PATH=/usr/lib/wsl/lib/
12:05:26-185279 INFO Kohya_ss GUI version: v24.1.7
12:05:26-258257 INFO Submodule initialized and updated.
12:05:26-259637 INFO nVidia toolkit detected
12:05:28-843697 INFO Torch 2.1.2+cu118
12:05:28-988088 INFO Torch backend: nVidia CUDA 11.8 cuDNN 8700
12:05:29-018737 INFO Torch detected GPU: Tesla T4 VRAM 14918 Arch (7, 5) Cores 40
12:05:29-025589 INFO Python version is 3.10.12 (main, Jan 17 2025, 14:35:34) [GCC 11.4.0]
12:05:29-027176 INFO Verifying modules installation status from /home/kohya_ss/package/requirements_linux.txt...
12:05:29-031445 INFO Verifying modules installation status from requirements.txt...
2025-01-25 12:05:33.860689: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:05:33.911873: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-25 12:05:33.911921: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-25 12:05:33.913381: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-25 12:05:33.927310: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:05:33.929022: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-25 12:05:35.732607: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
12:05:40-607170 INFO headless: False
12:05:40-691975 INFO Using shell=True when running external commands...
/home/kohya_ss/package/venv/lib/python3.10/site-packages/gradio/analytics.py:106: UserWarning: IMPORTANT: You are using gradio version 4.43.0, however version 4.44.1 is available, please upgrade.
--------
warnings.warn(
Running on local URL: http://127.0.0.1:7860
Running on public URL: https://xxxx.gradio.live
This share link expires in 72 hours. For free permanent hosting and GPU upgrades, run `gradio deploy` from Terminal to deploy to Spaces (https://huggingface.co/spaces)
12:05:57-434344 INFO SDXL model selected. Setting sdxl parameters
12:06:02-306668 INFO SDXL model selected. Setting sdxl parameters
12:06:15-520353 INFO Start training LoRA Standard ...
12:06:15-521888 INFO Validating lr scheduler arguments...
12:06:15-523131 INFO Validating optimizer arguments...
12:06:15-524087 INFO Validating /data/crop-test/log existence and writability... SUCCESS
12:06:15-525141 INFO Validating /data/crop-test/model existence and writability... SUCCESS
12:06:15-526304 INFO Validating stabilityai/stable-diffusion-xl-base-1.0 existence... SUCCESS
12:06:15-527309 INFO Validating /data/crop-test/img existence... SUCCESS
12:06:15-528331 INFO Folder 50_img woman: 50 repeats found
12:06:15-529470 INFO Folder 50_img woman: 24 images found
12:06:15-530403 INFO Folder 50_img woman: 24 * 50 = 1200 steps
12:06:15-531361 INFO Regulatization factor: 1
12:06:15-532252 INFO Total steps: 1200
12:06:15-533133 INFO Train batch size: 1
12:06:15-534001 INFO Gradient accumulation steps: 1
12:06:15-534896 INFO Epoch: 1
12:06:15-535786 INFO Max train steps: 1600
12:06:15-536643 INFO stop_text_encoder_training = 0
12:06:15-537544 INFO lr_warmup_steps = 160
12:06:15-539077 INFO Saving training config to /data/crop-test/model/last_20250125-120615.json...
12:06:15-540462 INFO Executing command: /home/kohya_ss/package/venv/bin/accelerate launch --dynamo_backend no --dynamo_mode default --mixed_precision fp16 --num_processes 1
--num_machines 1 --num_cpu_threads_per_process 2 /home/kohya_ss/package/sd-scripts/sdxl_train_network.py --config_file
/data/crop-test/model/config_lora-20250125-120615.toml
12:06:15-544535 INFO Command executed.
2025-01-25 12:06:24.777838: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:06:24.823094: E external/local_xla/xla/stream_executor/cuda/cuda_dnn.cc:9261] Unable to register cuDNN factory: Attempting to register factory for plugin cuDNN when one has already been registered
2025-01-25 12:06:24.823142: E external/local_xla/xla/stream_executor/cuda/cuda_fft.cc:607] Unable to register cuFFT factory: Attempting to register factory for plugin cuFFT when one has already been registered
2025-01-25 12:06:24.824557: E external/local_xla/xla/stream_executor/cuda/cuda_blas.cc:1515] Unable to register cuBLAS factory: Attempting to register factory for plugin cuBLAS when one has already been registered
2025-01-25 12:06:24.831922: I external/local_tsl/tsl/cuda/cudart_stub.cc:31] Could not find cuda drivers on your machine, GPU will not be used.
2025-01-25 12:06:24.832176: I tensorflow/core/platform/cpu_feature_guard.cc:182] This TensorFlow binary is optimized to use available CPU instructions in performance-critical operations.
To enable the following instructions: AVX2 AVX512F AVX512_VNNI FMA, in other operations, rebuild TensorFlow with the appropriate compiler flags.
2025-01-25 12:06:25.870362: W tensorflow/compiler/tf2tensorrt/utils/py_utils.cc:38] TF-TRT Warning: Could not find TensorRT
2025-01-25 12:06:27 INFO Loading settings from /data/crop-test/model/config_lora-20250125-120615.toml... train_util.py:4174
INFO /data/crop-test/model/config_lora-20250125-120615 train_util.py:4193
2025-01-25 12:06:27 INFO prepare tokenizers sdxl_train_util.py:138
INFO update token length: 75 sdxl_train_util.py:163
INFO Using DreamBooth method. train_network.py:172
2025-01-25 12:06:28 INFO prepare images. train_util.py:1815
INFO found directory /data/crop-test/img/50_img woman contains 24 image files train_util.py:1762
INFO 1200 train images with repeating. train_util.py:1856
INFO 0 reg images. train_util.py:1859
WARNING no regularization images / ๆญฃๅๅ็ปๅใ่ฆใคใใใพใใใงใใ train_util.py:1864
INFO [Dataset 0] config_util.py:572
batch_size: 1
resolution: (512, 512)
enable_bucket: True
network_multiplier: 1.0
min_bucket_reso: 256
max_bucket_reso: 2048
bucket_reso_steps: 64
bucket_no_upscale: True
[Subset 0 of Dataset 0]
image_dir: "/data/crop-test/img/50_img woman"
image_count: 24
num_repeats: 50
shuffle_caption: False
keep_tokens: 0
keep_tokens_separator:
caption_separator: ,
secondary_separator: None
enable_wildcard: False
caption_dropout_rate: 0.0
caption_dropout_every_n_epoches: 0
caption_tag_dropout_rate: 0.0
caption_prefix: None
caption_suffix: None
color_aug: False
flip_aug: False
face_crop_aug_range: None
random_crop: False
token_warmup_min: 1,
token_warmup_step: 0,
alpha_mask: False,
is_reg: False
class_tokens: img woman
caption_extension: .txt
INFO [Dataset 0] config_util.py:578
INFO loading image sizes. train_util.py:911
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 24/24 [00:00<00:00, 4340.24it/s]
INFO make buckets train_util.py:917
WARNING min_bucket_reso and max_bucket_reso are ignored if bucket_no_upscale is set, because bucket reso is defined by image size automatically train_util.py:934
/ bucket_no_upscaleใๆๅฎใใใๅ ดๅใฏใbucketใฎ่งฃๅๅบฆใฏ็ปๅใตใคใบใใ่ชๅ่จ็ฎใใใใใใmin_bucket_resoใจmax_bucket_resoใฏ็ก่ฆใใใพใ
INFO number of images (including repeats) / ๅbucketใฎ็ปๅๆๆฐ๏ผ็นฐใ่ฟใๅๆฐใๅซใ๏ผ train_util.py:963
INFO bucket 0: resolution (256, 256), count: 150 train_util.py:968
INFO bucket 1: resolution (320, 320), count: 50 train_util.py:968
INFO bucket 2: resolution (320, 384), count: 50 train_util.py:968
INFO bucket 3: resolution (384, 384), count: 100 train_util.py:968
INFO bucket 4: resolution (384, 448), count: 50 train_util.py:968
INFO bucket 5: resolution (448, 384), count: 50 train_util.py:968
INFO bucket 6: resolution (448, 448), count: 300 train_util.py:968
INFO bucket 7: resolution (448, 512), count: 200 train_util.py:968
INFO bucket 8: resolution (448, 576), count: 50 train_util.py:968
INFO bucket 9: resolution (512, 448), count: 150 train_util.py:968
INFO bucket 10: resolution (512, 512), count: 50 train_util.py:968
INFO mean ar error (without repeats): 0.04031468409762353 train_util.py:973
WARNING clip_skip will be unexpected / SDXLๅญฆ็ฟใงใฏclip_skipใฏๅไฝใใพใใ sdxl_train_util.py:352
INFO preparing accelerator train_network.py:225
accelerator device: cuda
INFO loading model for process 0/1 sdxl_train_util.py:33
INFO load Diffusers pretrained models: stabilityai/stable-diffusion-xl-base-1.0, variant=fp16 sdxl_train_util.py:88
Loading pipeline components...: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 6/6 [00:49<00:00, 8.20s/it]
2025-01-25 12:07:17 INFO U-Net converted to original U-Net sdxl_train_util.py:124
2025-01-25 12:07:18 INFO Enable xformers for U-Net train_util.py:3040
import network module: networks.lora
2025-01-25 12:07:21 INFO [Dataset 0] train_util.py:2323
INFO caching latents. train_util.py:1095
INFO checking cache validity... train_util.py:1105
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 24/24 [00:00<00:00, 299593.14it/s]
INFO caching latents... train_util.py:1144
0%| | 0/24 [00:00<?, ?it/s]/home/kohya_ss/package/venv/lib/python3.10/site-packages/torch/nn/modules/conv.py:456: UserWarning: Applied workaround for CuDNN issue, install nvrtc.so (Triggered internally at ../aten/src/ATen/native/cudnn/Conv_v8.cpp:80.)
return F.conv2d(input, weight, bias, self.stride,
100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 24/24 [00:10<00:00, 2.28it/s]
2025-01-25 12:07:42 INFO create LoRA network. base dim (rank): 8, alpha: 1 lora.py:928
INFO neuron dropout: p=None, rank dropout: p=None, module dropout: p=None lora.py:929
INFO create LoRA for Text Encoder 1: lora.py:1020
INFO create LoRA for Text Encoder 2: lora.py:1020
INFO create LoRA for Text Encoder: 264 modules. lora.py:1028
2025-01-25 12:07:43 INFO create LoRA for U-Net: 722 modules. lora.py:1036
INFO enable LoRA for text encoder: 264 modules lora.py:1077
INFO enable LoRA for U-Net: 722 modules lora.py:1082
prepare optimizer, data loader etc.
INFO use 8-bit AdamW optimizer | {} train_util.py:4327
running training / ๅญฆ็ฟ้ๅง
num train images * repeats / ๅญฆ็ฟ็ปๅใฎๆฐร็นฐใ่ฟใๅๆฐ: 1200
num reg images / ๆญฃๅๅ็ปๅใฎๆฐ: 0
num batches per epoch / 1epochใฎใใใๆฐ: 1200
num epochs / epochๆฐ: 2
batch size per device / ใใใใตใคใบ: 1
gradient accumulation steps / ๅพ้
ใๅ่จใใในใใใๆฐ = 1
total optimization steps / ๅญฆ็ฟในใใใๆฐ: 1600
steps: 0%| | 0/1600 [00:00<?, ?it/s]
epoch 1/2
2025-01-25 12:07:53 INFO epoch is incremented. current_epoch: 0, epoch: 1 train_util.py:703
steps: 75%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ | 1200/1600 [22:50<07:36, 1.14s/it, avr_loss=0.136]
saving checkpoint: /data/crop-test/model/last-000001.safetensors
epoch 2/2
2025-01-25 12:30:44 INFO epoch is incremented. current_epoch: 1, epoch: 2 train_util.py:703
steps: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1600/1600 [30:20<00:00, 1.14s/it, avr_loss=0.129]
saving checkpoint: /data/crop-test/model/last.safetensors
2025-01-25 12:38:15 INFO model saved. train_network.py:1104
steps: 100%|โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ| 1600/1600 [30:21<00:00, 1.14s/it, avr_loss=0.129]
12:38:23-657178 INFO Training has ended.
Generation steps
I first copied /data/crop-test/models/last.safetensors to /data. Then I create a pipeline script to create outputs with trigger word img
#!/usr/bin/env python3.10
import torch
from diffusers import DiffusionPipeline
pipe = DiffusionPipeline.from_pretrained("stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float32)
pipe.load_lora_weights("last.safetensors", adapter_name="faceTest", weight_name="faceTest")
pipe.set_adapters(['faceTest'], adapter_weights=[0.75])
pipe.to('cuda')
pipe.fuse_lora(lora_scale=0.75)
pipe.enable_attention_slicing()
prompt = "cinematic photo of jennlaw, studio quality, woman standing in a room, doing a public speech, blur the background, smiling, having a positive face, wearing a v-neck black sweater, 35mm photograph, film, bokeh, professional, 4k, highly detailed"
negative_prompt = "drawing, painting, crayon, sketch, graphite, impressionist, noisy, blurry, soft, deformed, ugly nsfw, lowres, bad anatomy, bad hands, text, error, missing fingers, extra digit, fewer digits, cropped, worst quality, low quality, normal quality, jpeg artifacts, signature, watermark, username, blurry"
images = pipe(prompt,
negative_prompt=negative_prompt,
guidance_scale=5,
num_inference_steps=50,
width=768,
height=1024,
num_images_per_prompt=2).images
for idx, img in enumerate(images):
img.save(f"outputs/{idx}.jpg")
After a while the results is generated. But itโs nothing like the inputs.
These are the the inputs,
These are the outputs,
I have completely no idea whatโs wrong, canโt find anything useful online, chatgpt didnโt help either.
Does anyone know whatโs wrong?