I am learning the diffusion model recently. In the beginning, I did not pay a lot of effort to train the desired U-Net. As we know, the U-Net is trained to predict a noise according to the loss function:
Therefore, I am trying to conduct a very trivial test to verify the whole framework. I just recorded \epsilon_0 at each step of adding noise, and substitute it to the sampling process.
In my opinion, as the U-Net is trained to predict \epsilon_0, if I use the ground-truth noise, it should still work. Although this process may lose variation, it is expected to reconstruct the original image.
However, the results imply I am wrong.
When T=300, the sampling process is
The information is totally lost!
So, why does this not work? According to formulations, it should work. What was I missing?