Error while training LORA in KOHYA_SS (stabilityai/stable-diffusion-xl-base-1.0)

Hello

I have provided 1 image for training. It’s a simple JPG.

I am getting the error below:

Traceback (most recent call last):
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\sd-scripts\sdxl_train_network.py", line 185, in <module>
    trainer.train(args)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\sd-scripts\train_network.py", line 272, in train
    train_dataset_group.cache_latents(vae, args.vae_batch_size, args.cache_latents_to_disk, accelerator.is_main_process)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\sd-scripts\library\train_util.py", line 2325, in cache_latents
    dataset.cache_latents(vae, vae_batch_size, cache_to_disk, is_main_process)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\sd-scripts\library\train_util.py", line 1146, in cache_latents
    cache_batch_latents(vae, cache_to_disk, batch, subset.flip_aug, subset.alpha_mask, subset.random_crop)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\sd-scripts\library\train_util.py", line 2775, in cache_batch_latents
    raise RuntimeError(f"NaN detected in latents: {info.absolute_path}")
RuntimeError: NaN detected in latents: D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\trained-model\img\40_A Event of a Pharma Client A Event of a Pharma Client\Sample1.jpg
Traceback (most recent call last):
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\runpy.py", line 196, in _run_module_as_main
    return _run_code(code, main_globals, None,
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\runpy.py", line 86, in _run_code
    exec(code, run_globals)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\venv\Scripts\accelerate.EXE\__main__.py", line 7, in <module>
    sys.exit(main())
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\accelerate_cli.py", line 47, in main
    args.func(args)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 1017, in launch_command
    simple_launcher(args)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\venv\lib\site-packages\accelerate\commands\launch.py", line 637, in simple_launcher
    raise subprocess.CalledProcessError(returncode=process.returncode, cmd=cmd)
subprocess.CalledProcessError: Command '['D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\venv\\Scripts\\python.exe', 'D:/Ganu/AIImage/huggingface/kohya_ss/kohya_ss/sd-scripts/sdxl_train_network.py', '--config_file', 'D:/Ganu/AIImage/huggingface/kohya_ss/kohya_ss/trained-model\\model/config_lora-20250130-144011.toml']' returned non-zero exit status 1.
14:43:08-514022 INFO     Training has ended.

The config file is as below:

bucket_no_upscale = true
bucket_reso_steps = 64
cache_latents = true
caption_extension = ".txt"
clip_skip = 1
dynamo_backend = "no"
enable_bucket = true
epoch = 1
gradient_accumulation_steps = 1
huber_c = 0.1
huber_schedule = "snr"
learning_rate = 0.0001
logging_dir = "D:/Ganu/AIImage/huggingface/kohya_ss/kohya_ss/trained-model\\log"
loss_type = "l2"
lr_scheduler = "cosine"
lr_scheduler_args = []
lr_scheduler_num_cycles = 1
lr_scheduler_power = 1
lr_warmup_steps = 160
max_bucket_reso = 2048
max_data_loader_n_workers = 0
max_grad_norm = 1
max_timestep = 1000
max_token_length = 75
max_train_steps = 1600
min_bucket_reso = 256
mixed_precision = "fp16"
multires_noise_discount = 0.3
network_alpha = 1
network_args = []
network_dim = 8
network_module = "networks.lora"
noise_offset_type = "Original"
optimizer_args = []
optimizer_type = "AdamW8bit"
output_dir = "D:/Ganu/AIImage/huggingface/kohya_ss/kohya_ss/trained-model\\model"
output_name = "last"
pretrained_model_name_or_path = "stabilityai/stable-diffusion-xl-base-1.0"
prior_loss_weight = 1
resolution = "512,512"
sample_prompts = "D:/Ganu/AIImage/huggingface/kohya_ss/kohya_ss/trained-model\\model\\prompt.txt"
sample_sampler = "euler_a"
save_every_n_epochs = 1
save_model_as = "safetensors"
save_precision = "fp16"
text_encoder_lr = 0.0001
train_batch_size = 1
train_data_dir = "D:/Ganu/AIImage/huggingface/kohya_ss/kohya_ss/trained-model\\img"
unet_lr = 0.0001
xformers = true

1 Like

Maybe this is it. There is a famous bug in the VAE of stabilityai/stable-diffusion-xl-base-1.0, and it seems that the VAE does not work properly with fp16 precision…

1 Like

Hello

I successfully trained the model. Your help resolved it.

Now when I run the model to generate a new image, I get the following error:

python John-Training-30thJan2025.py
Define the path to the directory containing your model and LoRA weights
Load the base model using StableDiffusionPipeline
Loading pipeline components...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 5/5 [00:00<00:00,  7.55it/s]
You have disabled the safety checker for <class 'diffusers.pipelines.stable_diffusion.pipeline_stable_diffusion.StableDiffusionPipeline'> by passing `safety_checker=None`. Ensure that you abide to the conditions of the Stable Diffusion license and do not expose unfiltered results in services or applications open to the public. Both the diffusers team and Hugging Face strongly recommend to keep the safety filter enabled in all public facing circumstances, disabling it only for use-cases that involve analyzing network behavior or auditing its results. For more information, please have a look at https://github.com/huggingface/diffusers/pull/254 .
Load the LoRA weights
Invalid LoRA checkpoint. Please check the compatibility and format of the weights file.
Traceback (most recent call last):
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\user\John-Training-30thJan2025.py", line 34, in <module>
    raise e
  File "D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\user\John-Training-30thJan2025.py", line 31, in <module>
    pipeline.load_lora_weights(lora_weights_path)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\diffusers\loaders\lora_pipeline.py", line 129, in load_lora_weights
    self.load_lora_into_unet(
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\diffusers\loaders\lora_pipeline.py", line 305, in load_lora_into_unet
    unet.load_lora_adapter(
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\diffusers\loaders\peft.py", line 301, in load_lora_adapter
    inject_adapter_in_model(lora_config, self, adapter_name=adapter_name, **peft_kwargs)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\peft\mapping.py", line 260, in inject_adapter_in_model
    peft_model = tuner_cls(model, peft_config, adapter_name=adapter_name, low_cpu_mem_usage=low_cpu_mem_usage)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\peft\tuners\lora\model.py", line 141, in __init__
    super().__init__(model, config, adapter_name, low_cpu_mem_usage=low_cpu_mem_usage)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\peft\tuners\tuners_utils.py", line 184, in __init__
    self.inject_adapter(self.model, adapter_name, low_cpu_mem_usage=low_cpu_mem_usage)
  File "D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\peft\tuners\tuners_utils.py", line 520, in inject_adapter
    raise ValueError(error_msg)
ValueError: Target modules {'8.1.transformer_blocks.5.attn2.to_v', '8.1.transformer_blocks.8.attn1.to_out.0', 'mid_block.1.transformer_blocks.8.attn1.to_k', '1.1.transformer_blocks.0.attn2.to_q', '8.1.transformer_blocks.9.attn1.to_q', 'mid_block.1.proj_out', '7.1.transformer_blocks.5.attn2.to_out.0', 'mid_block.1

My program is as follows:

from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch
import os
import numpy as np
from PIL import Image

# Clear GPU memory before starting 
torch.cuda.empty_cache() 

# Set seed for reproducibility 
#torch.manual_seed(6666666) 
#np.random.seed(6666666)

# Define the path to the directory containing your model and LoRA weights
print("Define the path to the directory containing your model and LoRA weights")
model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\trained-model\\model\\" 
#lora_weights_path = os.path.join(model_dir, "last.safetensors")
lora_weights_path = os.path.join(model_dir, "last.safetensors")

# Load the base model using StableDiffusionPipeline
print("Load the base model using StableDiffusionPipeline")
hf_token = "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"
pipeline = StableDiffusionPipeline.from_pretrained(
    "stabilityai/stable-diffusion-xl-base-1.0",
    torch_dtype=torch.float32,
).to("cuda")

# Load the LoRA weights
print("Load the LoRA weights")
try:
    pipeline.load_lora_weights(lora_weights_path)
except ValueError as e:
    print("Invalid LoRA checkpoint. Please check the compatibility and format of the weights file.")
    raise e

# Generate an image from a text prompt
print("Generate an image from a text prompt")
text_prompt = "A Event of a Pharma Client"
generated_image = pipeline(prompt=text_prompt).images[0]

# Convert the generated image to a NumPy array
generated_image_np = np.array(generated_image)

# Check the generated image properties
print("Generated image properties:")
print("Shape:", generated_image_np.shape)
print("Data type:", generated_image_np.dtype)
print("Value range:", generated_image_np.min(), "-", generated_image_np.max())

# Handle NaN or infinite values and ensure the range is valid
print("Handle NaN or infinite values and ensure the range is valid")
generated_image_np = np.nan_to_num(generated_image_np, nan=0.0, posinf=255.0, neginf=0.0)
generated_image_np = np.clip(generated_image_np, 0, 255)
generated_image_np = generated_image_np.astype(np.uint8)

# Save or display the generated image
print("Save or display the generated image")

# Convert the NumPy array back to a PIL Image and save or display the generated image
pil_image = Image.fromarray(generated_image_np)
pil_image.save("generated_image.jpg")
pil_image.show()

1 Like

Hello. It seems to be a compatibility issue with the Diffusers.:sweat_smile: It is said to work after converting.

Hello

I ran the below program and getting a blank black image.

The code

from diffusers import AutoPipelineForText2Image
import torch
import os
import numpy as np
from PIL import Image

# Clear GPU memory before starting 
torch.cuda.empty_cache() 

# Set seed for reproducibility 
#torch.manual_seed(6666666) 
#np.random.seed(6666666)

# Define the path to the directory containing your model and LoRA weights
print("Define the path to the directory containing your model and LoRA weights")
model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\trained-model\\model\\" 
#lora_weights_path = os.path.join(model_dir, "last.safetensors")
lora_weights_path = os.path.join(model_dir, "last.safetensors")

# Load the base model using StableDiffusionPipeline
print("Load the base model using StableDiffusionPipeline")
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter_id = "wangfuyun/PCM_SDXL_LoRAs"

pipeline = AutoPipelineForText2Image.from_pretrained(model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")


# Load the LoRA weights
print("Load the LoRA weights")
try:
    pipeline.load_lora_weights(lora_weights_path, weight_name="last.safetensors")
except ValueError as e:
    print("Invalid LoRA checkpoint. Please check the compatibility and format of the weights file.")
    raise e

# Generate an image from a text prompt
print("Generate an image from a text prompt")
text_prompt = "A Event of a Pharma Client"
generated_image = pipeline(prompt=text_prompt).images[0]

# Convert the image data to a NumPy array
image_data = np.array(generated_image)

# Handle NaN or infinite values
image_data = np.nan_to_num(image_data, nan=0.0, posinf=255.0, neginf=0.0)

# Ensure the values are within the range [0, 255]
image_data = np.clip(image_data, 0, 255)

# Convert to uint8
image_data = image_data.astype(np.uint8)

# Create a PIL image from the NumPy array
final_image = Image.fromarray(image_data)

# Save and display the final image
final_image.save("generated_image.png")
final_image.show()

The output

python John-Training-30thJan2025.py
Define the path to the directory containing your model and LoRA weights
Load the base model using StableDiffusionPipeline
Loading pipeline components...: 100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 7/7 [00:02<00:00,  3.23it/s]
Load the LoRA weights
Generate an image from a text prompt
100%|β–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆβ–ˆ| 50/50 [45:56<00:00, 55.14s/it]
D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\diffusers\image_processor.py:147: RuntimeWarning: invalid value encountered in cast
  images = (images * 255).round().astype("uint8")

The image is below

1 Like

I generated this image by like this

from diffusers import AutoPipelineForText2Image
import torch
import os
import numpy as np
from PIL import Image

# Clear GPU memory before starting 
torch.cuda.empty_cache() 

# Set seed for reproducibility 
#torch.manual_seed(6666666) 
#np.random.seed(6666666)

# Define the path to the directory containing your model and LoRA weights
print("Define the path to the directory containing your model and LoRA weights")
model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\trained-model\\model\\" 
#lora_weights_path = os.path.join(model_dir, "last.safetensors")
lora_weights_path = os.path.join(model_dir, "last.safetensors")

# Load the base model using StableDiffusionPipeline
print("Load the base model using StableDiffusionPipeline")
model_id = "stabilityai/stable-diffusion-xl-base-1.0"
adapter_id = "wangfuyun/PCM_SDXL_LoRAs"

pipeline = AutoPipelineForText2Image.from_pretrained(model_id, torch_dtype=torch.float16, variant="fp16").to("cuda")

# Load the LoRA weights
print("Load the LoRA weights")
try:
    pipeline.load_lora_weights(lora_weights_path, weight_name="last.safetensors")
except ValueError as e:
    print("Invalid LoRA checkpoint. Please check the compatibility and format of the weights file.")
    raise e

# Generate an image from a text prompt
print("Generate an image from a text prompt")
text_prompt = "A Event of a Pharma Client"
generated_image = pipeline(prompt=text_prompt).images[0]
generated_image.save("generated_image.png")