Why using ground-truth noise in a diffusion model does not work?

I am learning the diffusion model recently. In the beginning, I did not pay a lot of effort to train the desired U-Net. As we know, the U-Net is trained to predict a noise according to the loss function:

\frac{1}{2} \text{const}(\alpha_t) \|\epsilon_\theta - \epsilon_0 \|^2_2.

Therefore, I am trying to conduct a very trivial test to verify the whole framework. I just recorded \epsilon_0 at each step of adding noise, and substitute it to the sampling process.

In my opinion, as the U-Net is trained to predict \epsilon_0, if I use the ground-truth noise, it should still work. Although this process may lose variation, it is expected to reconstruct the original image.

However, the results imply I am wrong.

When T=300, the sampling process is

The information is totally lost!

So, why does this not work? According to formulations, it should work. What was I missing?