Hi,
I am confused. Why in the StableDiffusionInpaintPipelineV2
a linear interpolation between denoised latents and initial latents weighted by the mask only happens when the number of channels of the unet is 4 and not if it’s 9? Here is the line
Wouldn’t it make sense to add noise only where mask==1
and leave the rest as the initial latent since in those regions we don’t need to generate any content?