FluxInpaintPipeline returns an image with an irrelevant background

Hi.

This is my code for the flux inpaint pipeline:

import torch
from PIL import Image
from diffusers import FluxInpaintPipeline
from diffusers.utils import load_image

pipe = FluxInpaintPipeline.from_pretrained("black-forest-labs/FLUX.1-schnell", torch_dtype=torch.bfloat16)
pipe.to("cuda")
iter = 3

prompt = "A car parked by a beach, high quality, photorealistic"
img_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo.png"
mask_url = "https://raw.githubusercontent.com/CompVis/latent-diffusion/main/data/inpainting_examples/overture-creations-5sI6fQgYIuo_mask.png"
source = load_image(Image.open('car-001.png'))
mask = load_image(Image.open('car_black_and_white.png'))
image = pipe(prompt=prompt, height=2048, width=1024, image=source, mask_image=mask, strength=0.8, num_inference_steps=40, guidance_scale=9.0, num_images_per_prompt=1).images[0]
image.save(f"flux_car_{iter}_inpainting.png")

The car’s is originally parked in a parking lot. I am trying to inpaint it with a beach background but its returning the same car inpainted into similar scenarios as the original one.

This is the original:

These are the result:

Where am I going wrong? I haven’t had any luck in implementing controlnet and I’m not even sure if that would fix this problem.

perhaps

image[0].save(f"flux_car_{iter}_inpainting.png")

I don’t understand. Sorry. Could you break down your reasoning?

Sorry. I mistaken. Because from phones in bed…

So what do you think is the cause?

  • guidance_scake is too high
  • height is too large
  • perhaps images[-1] is correct
  • Flux related is now buggy for now
    etc.

Sorry. Sleepipy, I myself is buggy.

Sorry, I was sleepwalking. I’m up. Maybe this is the mistake.

In default, this function returns a dictionary, not a tuple; if you set it to False and return a tuple, you can use it like a normal T2I.

( prompt: Union = Noneprompt_2: Union = Noneimage: Union = Nonemask_image: Union = Nonemasked_image_latents: Union = Noneheight: Optional = Nonewidth: Optional = Nonepadding_mask_crop: Optional = Nonestrength: float = 0.6num_inference_steps: int = 28timesteps: List = Noneguidance_scale: float = 7.0num_images_per_prompt: Optional = 1generator: Union = Nonelatents: Optional = Noneprompt_embeds: Optional = Nonepooled_prompt_embeds: Optional = Noneoutput_type: Optional = 'pil'return_dict: bool = Truejoint_attention_kwargs: Optional = Nonecallback_on_step_end: Optional = Nonecallback_on_step_end_tensor_inputs: List = ['latents']max_sequence_length: int = 512 ) → ~pipelines.flux.FluxPipelineOutput or tuple
return_dict (bool, optional, defaults to True) — Whether or not to return a ~pipelines.flux.FluxPipelineOutput instead of a plain tuple.