I’m encountering a persistent issue when running the StableDiffusionInpaintPipeline for an inpainting task. Despite passing inputs in the expected formats (both the image and mask are in PIL.Image.Image format with correct sizes), I keep receiving the following error:
ValueError: Input is in incorrect format. Currently, we only support <class 'PIL.Image.Image'>, <class 'numpy.ndarray'>, <class 'torch.Tensor'>
Here’s the code that triggers the error:
# Image and mask setup
image_pil = Image.fromarray(image_np)
mask_pil = Image.fromarray(black_mask).convert("L")
# Generator for reproducibility
generator = torch.Generator(device="cuda").manual_seed(0)
image = model["pipeline"](
prompt=prompt,
negative_prompt=IMG_INPAINTING_NEG_PROMPT,
image=image_pil, # PIL Image
mask=mask_pil, # Grayscale mask (mode "L")
guidance_scale=8.0,
num_inference_steps=50,
generator=generator,
).images[0]
Image and Mask Details:
Image size : (512, 768), mode: RGB
Mask size : (512, 768), mode: L
The mask is binary (contains only 0 and 255 values).
I’ve also tried using a simple manually created mask to ensure that FastSAM-generated masks aren’t causing the issue, but I still get the same error.
1 Like
It looks like you are stuck here, but I think this is a bug in Diffusers…?
# Copyright 2024 The HuggingFace Team. All rights reserved.
#
# Licensed under the Apache License, Version 2.0 (the "License");
# you may not use this file except in compliance with the License.
# You may obtain a copy of the License at
#
# http://www.apache.org/licenses/LICENSE-2.0
#
# Unless required by applicable law or agreed to in writing, software
# distributed under the License is distributed on an "AS IS" BASIS,
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
# See the License for the specific language governing permissions and
# limitations under the License.
import math
import warnings
from typing import List, Optional, Tuple, Union
import numpy as np
import PIL.Image
This file has been truncated. show original
def is_valid_image(image):
return isinstance(image, PIL.Image.Image) or isinstance(image, (np.ndarray, torch.Tensor)) and image.ndim in (2, 3)
Maybe this is correct.
def is_valid_image(image):
return isinstance(image, PIL.Image.Image) or (isinstance(image, (np.ndarray, torch.Tensor)) and image.ndim in (2, 3))
ndim is not an element of PIL.Image.Image.
Currently, it should be possible to slip through this check by passing it in numpy format.
@sayakpaul I found a crappy bug in Diffusers.