Why using ground-truth noise in a diffusion model does not work?

Phoenix0617 · August 1, 2023, 4:16am

I am learning the diffusion model recently. In the beginning, I did not pay a lot of effort to train the desired U-Net. As we know, the U-Net is trained to predict a noise according to the loss function:

\frac{1}{2} \text{const}(\alpha_t) \|\epsilon_\theta - \epsilon_0 \|^2_2.

Therefore, I am trying to conduct a very trivial test to verify the whole framework. I just recorded \epsilon_0 at each step of adding noise, and substitute it to the sampling process.

In my opinion, as the U-Net is trained to predict \epsilon_0, if I use the ground-truth noise, it should still work. Although this process may lose variation, it is expected to reconstruct the original image.

However, the results imply I am wrong.

When T=300, the sampling process is

The information is totally lost!

So, why does this not work? According to formulations, it should work. What was I missing?

Topic		Replies	Views
Why is the loss of Diffusion model calculated between "RANDOM noise" and "model predicted noise"? Not between "Actual added noise" and "model predicted noise"? 🧨 Diffusers	12	5271	November 27, 2023
A few questions about how (vanilla) diffusion works Beginners	1	851	September 25, 2022
Why the output of the UNet is noise? 🧨 Diffusers	0	274	November 14, 2023
Image reconstruction with diffusion model 🧨 Diffusers	0	758	March 9, 2024
How much does the initial noise influence the output quality? 🧨 Diffusers	2	223	September 8, 2024

Why using ground-truth noise in a diffusion model does not work?

Related topics