Summary
I am facing an issue with StableDiffusionImg2ImgPipeline
where it keeps throwing a ValueError
stating that the input image format is incorrect, even though I am passing a PIL.Image.Image
object as required.
Error Message
ValueError: Input is in incorrect format. Currently, we only support <class ‘PIL.Image.Image’>, <class ‘numpy.ndarray’>, <class ‘torch.Tensor’>
Steps Taken
I have tried the following solutions, but the error persists:
-
Checked input format
- Used
print(type(init_image_reloaded))
and confirmed that it is<class 'PIL.Image.Image'>
- Used
-
Ensured image size is correct
- Image size is
(512, 512)
, which is a multiple of 8.
- Image size is
-
Updated dependencies
- Ran:
!pip install --upgrade diffusers transformers accelerate ftfy !pip install --upgrade pydantic
- My
diffusers
version is latest.
- Ran:
-
Tried re-saving the image to ensure correct format
- Saved and reloaded as PNG:
init_image.save("temp_image.png", format="PNG") init_image_reloaded = Image.open("temp_image.png").convert("RGB")
- Saved and reloaded as PNG:
-
Checked if
StableDiffusionImg2ImgPipeline
works with NumPy arrays- Tried passing
np.array(init_image_reloaded)
instead ofPIL.Image.Image
, but same error occurs.
- Tried passing
-
Checked Python version and execution environment
- Using
Python 3.11
- Running on Google Colab
- Using
Code Example
import torch
from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image
import numpy as np
import os
# --- Load input image ---
init_image = Image.open("/content/food_nighit_ramen.jpeg").convert("RGB")
print("Original image type:", type(init_image), "size:", init_image.size, "mode:", init_image.mode)
# --- Re-save and reload image ---
temp_image_path = "temp_image.png"
init_image.save(temp_image_path, format="PNG")
init_image_reloaded = Image.open(temp_image_path).convert("RGB")
print("Reloaded image type:", type(init_image_reloaded), "size:", init_image_reloaded.size, "mode:", init_image_reloaded.mode)
# --- Load model ---
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
# --- Set parameters ---
prompt = "A majestic fantasy creature evolving with vibrant flames and sparkling water effects, ultra-detailed, epic fantasy art"
num_inference_steps = 50
guidance_scale = 7.5
strength = 0.7
# --- Run Image-to-Image (PIL.Image is passed) ---
result = pipe(
prompt=prompt,
init_image=init_image_reloaded, # **PIL.Image should be accepted**
strength=strength,
guidance_scale=guidance_scale,
num_inference_steps=num_inference_steps
)
generated_image = result.images[0]
# --- Save result ---
generated_image.save("generated_evolved_monster.png")
Questions
Why is StableDiffusionImg2ImgPipeline rejecting PIL.Image.Image as input?
Are there additional preprocessing steps required before passing the image?
Has there been a recent breaking change in diffusers that affects init_image input?
Any help would be greatly appreciated. Thank you!