I was playing with the pipeline trying to create a photo that my dog wearing a gold chain.
I create da pipline:
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(“stabilityai/stable-diffusion-2-1”).to(device)
got a photo from my local machine :
init_image = Image.open(“/Users/alex/Downloads/Meelo1.png”).convert(“RGB”)
Generated a prompt and then pass prompt and the image to the pipline:
prompt = “A dog face wearing a thick gold chain”
image = pipe(prompt=prompt, image=resized_image, strength=0.3, num_inference_steps=50).images[0]
When I display the generated image, it was only the original photo with some noises. Nothing resemble a gold chain is generated. I also tried to add a baseball cap or hat, but getting the same result. I also tried to adjusted the prompt too.
My question is “Is this the correct and sufficient way to achieve what I wanted?”
Thanks,
Alex
1 Like
With Image-to-Image, the composition is likely to be preserved, but there is a strong tendency for everything else to be redrawn. If you increase the strength, the necklace may be drawn, but there is a high possibility that another dog will appear.
For your intended use, using Inpainting is probably the best approach. Using ControlNet allows you to do more advanced things, but it is simply difficult.
Thanks, I did try multiple runs and confirmed you said. I am going to try Inpainting to see the results.
I also find that if I use a large inference steps, (say 1000 strength = 0.3), it tends to produce an error like thisimage_process.py line 147 Runtime warning: invalid value encounter in cast images=(images*255).round().astype(“uint8”) and returned a black image. Do you know what’s causing this warning and returining of a black image? Thanks!
1 Like
The black image is often returned when using an old GeForce (10x0 generation) or when it is caught by safety_checker. I have also heard that it can also happen when there is simply not enough RAM…
Also, the number of steps is usually sufficient between 20 and 100. Around 28 is usually fine. Even with fairly complex images, there is no benefit to making them too big. It is more likely to make them look strange.
The same goes for guidance_scale, which should be between 3.5 and 7.5. Raising these parameters does not simply improve image quality.
Thanks a lot for the insights. They are very helpful.
1 Like