Invalid image format

I’m encountering a persistent issue when running the StableDiffusionInpaintPipeline for an inpainting task. Despite passing inputs in the expected formats (both the image and mask are in PIL.Image.Image format with correct sizes), I keep receiving the following error:

ValueError: Input is in incorrect format. Currently, we only support <class 'PIL.Image.Image'>, <class 'numpy.ndarray'>, <class 'torch.Tensor'>

Here’s the code that triggers the error:

# Image and mask setup
image_pil = Image.fromarray(image_np)
mask_pil = Image.fromarray(black_mask).convert("L")

# Generator for reproducibility
generator = torch.Generator(device="cuda").manual_seed(0)

image = model["pipeline"](
    prompt=prompt,
    negative_prompt=IMG_INPAINTING_NEG_PROMPT,
    image=image_pil,  # PIL Image
    mask=mask_pil,    # Grayscale mask (mode "L")
    guidance_scale=8.0,
    num_inference_steps=50,
    generator=generator,
).images[0]

Image and Mask Details:

  • Image size: (512, 768), mode: RGB
  • Mask size: (512, 768), mode: L
  • The mask is binary (contains only 0 and 255 values).

I’ve also tried using a simple manually created mask to ensure that FastSAM-generated masks aren’t causing the issue, but I still get the same error.

1 Like

It looks like you are stuck here, but I think this is a bug in Diffusers…?

def is_valid_image(image):
    return isinstance(image, PIL.Image.Image) or isinstance(image, (np.ndarray, torch.Tensor)) and image.ndim in (2, 3)

Maybe this is correct.

def is_valid_image(image):
    return isinstance(image, PIL.Image.Image) or (isinstance(image, (np.ndarray, torch.Tensor)) and image.ndim in (2, 3))

ndim is not an element of PIL.Image.Image.

Currently, it should be possible to slip through this check by passing it in numpy format.

@sayakpaul I found a crappy bug in Diffusers.

I opened PR on github.