Summary
I am facing an issue with StableDiffusionImg2ImgPipeline where it keeps throwing a ValueError stating that the input image format is incorrect, even though I am passing a PIL.Image.Image object as required.
Error Message
ValueError: Input is in incorrect format. Currently, we only support <class ‘PIL.Image.Image’>, <class ‘numpy.ndarray’>, <class ‘torch.Tensor’>
Steps Taken
I have tried the following solutions, but the error persists:
-
Checked input format
- Used
print(type(init_image_reloaded))and confirmed that it is<class 'PIL.Image.Image'>
- Used
-
Ensured image size is correct
- Image size is
(512, 512), which is a multiple of 8.
- Image size is
-
Updated dependencies
- Ran:
!pip install --upgrade diffusers transformers accelerate ftfy !pip install --upgrade pydantic - My
diffusersversion is latest.
- Ran:
-
Tried re-saving the image to ensure correct format
- Saved and reloaded as PNG:
init_image.save("temp_image.png", format="PNG") init_image_reloaded = Image.open("temp_image.png").convert("RGB")
- Saved and reloaded as PNG:
-
Checked if
StableDiffusionImg2ImgPipelineworks with NumPy arrays- Tried passing
np.array(init_image_reloaded)instead ofPIL.Image.Image, but same error occurs.
- Tried passing
-
Checked Python version and execution environment
- Using
Python 3.11 - Running on Google Colab
- Using
Code Example
import torch
from diffusers import StableDiffusionImg2ImgPipeline
from PIL import Image
import numpy as np
import os
# --- Load input image ---
init_image = Image.open("/content/food_nighit_ramen.jpeg").convert("RGB")
print("Original image type:", type(init_image), "size:", init_image.size, "mode:", init_image.mode)
# --- Re-save and reload image ---
temp_image_path = "temp_image.png"
init_image.save(temp_image_path, format="PNG")
init_image_reloaded = Image.open(temp_image_path).convert("RGB")
print("Reloaded image type:", type(init_image_reloaded), "size:", init_image_reloaded.size, "mode:", init_image_reloaded.mode)
# --- Load model ---
model_id = "runwayml/stable-diffusion-v1-5"
pipe = StableDiffusionImg2ImgPipeline.from_pretrained(model_id, torch_dtype=torch.float16)
pipe = pipe.to("cuda")
# --- Set parameters ---
prompt = "A majestic fantasy creature evolving with vibrant flames and sparkling water effects, ultra-detailed, epic fantasy art"
num_inference_steps = 50
guidance_scale = 7.5
strength = 0.7
# --- Run Image-to-Image (PIL.Image is passed) ---
result = pipe(
prompt=prompt,
init_image=init_image_reloaded, # **PIL.Image should be accepted**
strength=strength,
guidance_scale=guidance_scale,
num_inference_steps=num_inference_steps
)
generated_image = result.images[0]
# --- Save result ---
generated_image.save("generated_evolved_monster.png")
Questions
Why is StableDiffusionImg2ImgPipeline rejecting PIL.Image.Image as input?
Are there additional preprocessing steps required before passing the image?
Has there been a recent breaking change in diffusers that affects init_image input?
Any help would be greatly appreciated. Thank you!