It was generated properly (though there is no LoRA)β¦ Is it an issue with the environment, or is the library version not matching?
from diffusers import StableDiffusionPipeline, EulerDiscreteScheduler
import torch
import os
import numpy as np
from PIL import Image
# Define the path to the directory containing your model and LoRA weights
print("Define the path to the directory containing your model and LoRA weights")
model_dir = "D:\\Ganu\\AIImage\\huggingface\\kohya_ss\\kohya_ss\\trained-model\\model\\"
lora_weights_path = os.path.join(model_dir, "last.safetensors")
# Load the base model using StableDiffusionPipeline
print("Load the base model using StableDiffusionPipeline")
pipeline = StableDiffusionPipeline.from_pretrained(
"stabilityai/stable-diffusion-2-1-base",
torch_dtype=torch.float16
).to("cuda")
# Generate an image from a text prompt
print("Generate an image from a text prompt")
text_prompt = "A beautiful Woman"
pil_image = pipeline(prompt=text_prompt).images[0]
# Save or display the generated image
print("Save or display the generated image")
# Convert the NumPy array to a PIL Image and save or display the generated image
pil_image.save("generated_image.jpg")
pil_image.show()
Hmm, that doesnβt seem strange.
If I had to say, Iβd say that PyTorch is suspicious, but if itβs not working, I think the black image itself wonβt be generated and it will crashβ¦
Is the safety_checkerβs blackout function being triggered? Was it in 2.1 too?
generated_image = pipeline(prompt=text_prompt).images[0]
# Handle NaN or infinite values and ensure the range is valid
print("Handle NaN or infinite values and ensure the range is valid ")
generated_image = np.nan_to_num(generated_image, nan=0.0, posinf=255.0, neginf=0.0)
generated_image = np.clip(generated_image, 0, 255)
generated_image = generated_image.astype(np.uint8)
# Save or display the generated image
print("Save or display the generated image")
# Convert the NumPy array to a PIL Image and save or display the generated image
pil_image = Image.fromarray(generated_image)
For now, the PIL image will be returned from the pipeline, so thatβs basically OK. But the problem isnβt here, is itβ¦
D:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\user>python John-13thJan2024-NoLora.py
Define the path to the directory containing your model and LoRA weights
Load the base model using StableDiffusionPipeline
Traceback (most recent call last):
File βD:\Ganu\AIImage\huggingface\kohya_ss\kohya_ss\user\John-13thJan2024-NoLora.pyβ, line 14, in
pipeline = StableDiffusionPipeline.from_pretrained(
File βD:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\huggingface_hub\utils_validators.pyβ, line 114, in _inner_fn
return fn(*args, **kwargs)
File βD:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\diffusers\pipelines\pipeline_utils.pyβ, line 710, in from_pretrained
raise NotImplementedError(
NotImplementedError: auto not supported. Supported strategies are: balanced
python John-13thJan2024-NoLora.py
Define the path to the directory containing your model and LoRA weights
Load the base model using StableDiffusionPipeline
Loading pipeline components...: 67%|ββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ | 4/6 [00:26<00:15, 7.91s/it]Taking `'Attention' object has no attribute 'key'` while using `accelerate.load_checkpoint_and_dispatch` to mean C:\Users\ADMIN\.cache\huggingface\hub\models--stabilityai--stable-diffusion-2-1-base\snapshots\5ede9e4bf3e3fd1cb0ef2f7a3fff13ee514fdf06\vae was saved with deprecated attention block weight names. We will load it with the deprecated attention block names and convert them on the fly to the new attention block format. Please re-save the model after this conversion, so we don't have to do the on the fly renaming in the future. If the model is from a hub checkpoint, please also re-upload it or open a PR on the original repository.
Loading pipeline components...: 100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 6/6 [00:29<00:00, 4.84s/it]
Generate an image from a text prompt
100%|βββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββββ| 50/50 [02:38<00:00, 3.18s/it]
D:\Ganu\AIImage\huggingface\kohya_ss\Python310\lib\site-packages\diffusers\image_processor.py:147: RuntimeWarning: invalid value encountered in cast
images = (images * 255).round().astype("uint8")
<PIL.Image.Image image mode=RGB size=512x512 at 0x1B4099BFF70>
Save or display the generated image
To begin with, there is the philosophical question of βwhat is a copy?β, and then there is the issue of infringement of rights when the rights holder of the original photograph or picture exists. For that reason, I think there are cases where something is considered a βlegal/socialβ copy, but technically, on a computer, what the Diffusion model does is not a copy of data or a database search, but rather βmemory recallβ from a neural network.
Even if you drew a hyper-realistic picture that looked exactly like an existing work of art, and it was of the same quality as a photograph, it would still be considered a copy, but it wouldnβt be called a copy. Someone might get angry about it though.
No matter which algorithm we use, the model doesnβt have the space to store all the images used for learning.
Is there a AI model which generates images from text prompt?
I think that all of the currently popular Text-to-Image models are basically capable of doing this. (SD2.1 is also capable.)
If you want to use prompts that are closer to natural language, you can achieve this by using newer architectures such as FLUX. (In this case, it is too heavy, so I think you will have to use some kind of cloud service.)
And the model hasnβt been trained on any real-data, something like DeepMind AlphaGo
They say that for Go, they donβt need data from real professional Go players to strengthen the model anymore. In the case of Go, there is a mathematical correct answer, so that approach is possible. You just need to reduce the number of incorrect answers.
However, words, pictures and photographs are almost entirely dependent on human perception, and are a kind of illusion created by humans. Birds and insects see colors differently, and the concepts that words refer to are even more unstable. In other words, while these models may be suitable or unsuitable for a given purpose, there are not many right or wrong answers, and relying solely on mathematics is probably not a good approach. The only thing that can be guaranteed physically is shape.
In addition, there has recently been a widespread attempt to train AI models using data output by AI models that have reached a certain level of maturity (synthetic data). However, this does not mean that real-world data has been excluded.
Also, there are several models that have been trained using only data that does not raise any legal issues (Creative Commons-compliant models). There are also some in HF.
However, I donβt know of any model that doesnβt use any real-world data. If you do that, I think you could create a model that generates some kind of image that doesnβt exist in the real world (I donβt even know if humans would be able to recognize it as an image)β¦
Well, if you initialize the model data randomly and then train it, you might be able to get something close, but the question is how to provide the images for this without using real-world data⦠It might be a chicken-and-egg problem.