Could there be an "remove noise" function to remove noise from noisy_latents, given the noise and the timestep?

When using “epsilon” prediction mode I notice that the model predicts a pre-defined noise with normal distribution. Since the noise as well as the noisy_latents is too ambiguous for any sementic loss (like measuring the clip feature distance between generated image & text prompt) during training, I wonder if there’s a way to directly substract the noise from the noisy_latents (instead of denoising step by step), ending up a pure latent without noise?
I guess this is practicable because noisy_latents looks like some weighted addition of latents and noise inside scheduler.add_noise function, but my poor math cannot afford to invert this process. May some kind-hearted guy help me? :slight_smile: