How much does the initial noise influence the output quality?

vinzenzuhr · May 13, 2024, 11:19am

Hello everyone

I’m using the diffuser library for a inpainting task on brain mri.
I follow two approaches: One is using an conditional unet2D model and one is using a unconditional unet2D model where I use the repaint pipeline for inference. Both are using DDIM for Inference.

The training looks good to me. I’m using a batch size of 4 samples, 100 epochs of training and have 880 imgs to train on. The training loss goes down to 5e-4 and the validation loss around 0.001. (I would like to post a pictures but as a new user I can only post one image?!)

At the beginning of my evaluation I set the seed for the evaluation dataloader and the initial noise for the diffusion process. I observed that the quality of my output depends heavily on the initial noise (especially removing the noise in the background).

Here you have four diffused pictures with different initial noise. The top right picture is a good one and the two on the left are bad ones. When I use the same initial noise but with another image to inpaint, the quality stays the same.

.

This suprised me a lot. I think that I have a bug somewhere in my environment (e.g. that I seed the training procedure and therefore it only learns a narrow range of the noise distribution). But I already spent days searching for it and didn’t find anything…

Did you ever observed a similar behavior or do you have ideas what could be the problem? I’m starting to get stumped. Any thoughts are appreciated!

If you’re intersted, the code is on Github: Link.
Because I can’t tell what’s the problem I can’t show you a specific part of the code…
In the repository you can find different jupyter notebooks e.g. lesion_filling_unconditioned_repaint.ipynb. The core python code is inside the folder “custom_modules”. The relevant files are “Training.py”, “TrainingConditional”, “Evaluation2D.py”, “Evaluation2DFilling.py” and the “DDIMInpaintPipeline.py”.

Best regards,
Vinzenz

vinzenzuhr · September 8, 2024, 10:16am

After some week I could solve it. The problem was that I just needed a longer training time. 1 day was not enought it needed around 3 days.

I thought the model has converged based on the loss progression. The thing is that the different timesteps have a different learning progression. While the larger timesteps (e.g. 980) have a smaller error (because they can learn almost the identity for noise prediction) the smaller timesteps (e.g. 30) have a bigger error. Looking at the loss function we can see spikes resulting from different timesteps. While the smaller timestep converged at some point the larger timesteps were still learning and have not generalized. If you’re interested I documented this in the report of my master thesis: Thesis_Diffusion_Lesions/08_Report/Report.pdf at main · vinzenzuhr/Thesis_Diffusion_Lesions · GitHub
Figure A.5 in the Appendix shows the loss of different timesteps.

Best,
Vinzenz

system · September 8, 2024, 10:17pm

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.

Topic		Replies	Views
Drastic outputs quality drops between Automatic1111 and Hugging face Diffusers 🧨 Diffusers	3	267	November 7, 2024
Why using ground-truth noise in a diffusion model does not work? Beginners	0	346	August 1, 2023
Why is the loss of Diffusion model calculated between "RANDOM noise" and "model predicted noise"? Not between "Actual added noise" and "model predicted noise"? 🧨 Diffusers	12	5267	November 27, 2023
A few questions about how (vanilla) diffusion works Beginners	1	850	September 25, 2022
1D Diffusers not behaving as expected, am i retarded? Beginners	1	29	April 30, 2025

How much does the initial noise influence the output quality?

Related topics