[Stable Diffusion] Error in "In Painting" pipeline

CristoJV · September 17, 2022, 10:26am

Hello!
Why in the “In Painting” pipeline the masking is done in the latents and not in the decoded VAE versions?

295 latents = (init_latents_proper * mask) + (latents * (1 - mask))

If this is correct, How the mask is mapped into the latent space? Are pixels original locations (clustered) reflected on expected positions in the latents space?

Should not be the algoritm the following?

Retrieve the latents.
Decode the latents.
Mix both images (original and decoded version) using the mask.
Encode the image to obtain the new latents.

Instead of:

Retrieve the latents.
Mix the original latents (with noise added corresponding to the timestep) with the latents.

anton-l · September 21, 2022, 8:19pm

Hi @CristoJV! As the diffusion process for Stable Diffusion works exclusively with the VAE latents, the masks received by the inpainting pipeline are getting reampled from 512x512 to 64x64 to mask the latents.

CristoJV · September 22, 2022, 1:05pm

Hi @anton-I! Thank you for replying.

Should the inpainting pipeline be upgraded to add a further step mixing the original image and the generated but in the image domain.

The issue is that, although the image latents are preserved after the masking, the VAE’s encoding and decoding functions produces losses. For example, after retrieving non-modified faces (not affected by the mask) they look a little bit uglier or distorted. Maybe if there is another further step that mix both images using the mask after the generation the results would improve without interfering the pipeline operation.

Emulator000 · April 22, 2023, 11:46am

Hi @CristoJV, I’m using SD 1.5 and in my code I just added a post-processing step in order to mix the original untouched image within the result decoded from VAE and the original mask (not downscaled) and I get a better result.

By the way, the output from VAE also differs in saturation and brightness and a slightly difference between the inpainted area and the original image is noticeable.

I’m guessing that the encode-decode process from VAE make the image to loses their original properties.

An idea that I’ll try for sure is dilating a bit the original mask in order to keep other border information from the decoded latent and then blend the luminosity of the latent and the original image; with this trick I think that I could achieve a better result.

By the way I just agree with you that the process should be upgraded in order to get better results.

For what matters, to me the inpainting process with Diffusers is just not usable for processing images with people faces, the distortion is fairly aggressive.

rauln · June 29, 2023, 10:23pm

Same problem here. Has anyone found a fix?

system · March 7, 2024, 12:52pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Set latents in StableDiffusionInpaintPipeline to original image 🧨 Diffusers	1	620	May 17, 2024
About latent space 🧨 Diffusers	0	158	April 1, 2024
Decoding latents to RGB without upscaling 🧨 Diffusers	12	11550	April 23, 2023
Is it possible to inpaint only the masked area? 🧨 Diffusers	6	4635	June 28, 2024
Multi_controlnet + inpaint 🧨 Diffusers	5	3596	November 12, 2023

[Stable Diffusion] Error in "In Painting" pipeline

Related topics