SDXL Image to Image, howto

Hi all

I’ve been using the SD1.5 image to image diffusers and they’ve been working really well.

The idea is that I take a basic drawing and make it real based on the prompt.

I’m trying to move over to SDXL but I can seem to get the image to image working.

I’m trying to do it the way the docs demonstrate but I get the exact same image back.

pipeimg = StableDiffusionXLImg2ImgPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-base-1.0", torch_dtype=torch.float16
        )

Calling it like this

imgtoimg = pipeimg(prompt=prompt, negative_prompt=negative_prompt,
                           image=sketch_image, generator=generator, guidance_scale=2.5, num_inference_steps=25,).images[0]

Any advice on how to make it work would be most welcome.

Thanks very much

EDIT:

I’ve tried loading the SDXL base from the StableDiffusionXLImg2ImgPipeline but the results are strange, it’s returning the same image but worse, its seeing something and trying to process it but it’s not nearly the same as SD1.5.

I’m not sure where I’m going wrong, please help peeps.

1 Like

Hi,

I’ve had the same issue and solved with some workaround:

self.refiner = DiffusionPipeline.from_pretrained(
            "stabilityai/stable-diffusion-xl-refiner-1.0",            
            torch_dtype=torch.float16,
            use_safetensors=True,
            variant="fp16"
        )

And then I call the refiner for each image:

    def refine(self, prompt: str, negative_prompt: str, init_image: Image, latents) -> List[Image]:
        images = []
        for latent in latents:
            images.append(self.refiner(prompt=prompt,
                                       negative_prompt=negative_prompt,
                                       image=init_image,
                                       latents=latent,
                                       num_inference_steps=self.n_steps,
                                       ).images[0])

        return images

Important:

  • init_image is the original image used in the base model, has to be exact size of the output: init_image = PIL.Image.open(init_image_path).convert(“RGB”)
  • the base model needs to return latents - not PIL.Image.
  • the ‘latents’ parameter in the function for the refiner has to be the .images of the return object of the base model (typeof diffusers.pipelines.pipeline_utils.ImagePipelineOutput)

For me the example in diffusers\pipelines\stable_diffusion_xl\pipeline_stable_diffusion_xl_img2img.py (line 52) helped me a lot…

Thanks so much. I’m going to try it and get back to you.

Do you have a link to this, I cant seem to find it.

diffusers\pipelines\stable_diffusion_xl\pipeline_stable_diffusion_xl_img2img.py

Hi all, ok based on the image to image example here:

https://huggingface.co/docs/diffusers/main/en/using-diffusers/sdxl#size-conditioning

I’m getting great results with good images to start, when I use finger drawn images I’m getting terrible results, the idea is to take the simple drawing and make it real. I get great results with the SD1.5 img to img and I just cant figure this one out. Please see my images as an example.

I’m resizing all images to 1024x1024 for best results

Original

I

Image to Image SDXL - Prompt - a black cat, snow, cinematic

Original

Image to Image SDXL - Prompt - A fantastical landscape

Image to Image with SD1.5 Prompt - A fantastical landscape