Hello, I’ve run a few experiments in the huggingface’s google colab, and some question have arisen.
If I add noise to an image (from the distribution the model was trained on) to turn it in an isotropic gaussian (concretely., I add T=1000 steps of noise to the image), I’d expect for the model to output that same image, since I believed that X_T gaussian is a sample that should represent the original image, like an embedding, is that not so? If I run this through the model, what happens is that the model produces a realistic image, but it is not the same as the one I used as input. Maybe I’ve run things wrong, but is this behaviour expected?
Then I’ve tried adding only 200-300 steps of noise, and try to denoise that, and that setup works as intended, it actually reverses the noise to the almost exactly the original image, like intended.
My questions, what is happening here? Why does adding all T steps of noise make the model produce an “arbitrary” image, while adding fewer noise steps produces the original image. I mean, it makes sense, but I thought the point was that even for all T steps of noise we should be able to get back exactly to the original image, not an arbitrary one.
Is there some big gap in my understanding of diffusion models?
Also, if I add noise to the image with two different random seeds, I know both of them will produce isotropic gaussians in the end (after all 1000 forward diffusion steps), but they will be different samples, since they came from a different seed, correct? That means one image can have many different “embeddings”? Is that that case? I mean, I’m still not convinced those serve as embeddings at all (since they don’t produce the original, as mentioned above).
Thanks in advance, these questions have been really bugging me.