How much does the initial noise influence the output quality?

Hello everyone

I’m using the diffuser library for a inpainting task on brain mri.
I follow two approaches: One is using an conditional unet2D model and one is using a unconditional unet2D model where I use the repaint pipeline for inference. Both are using DDIM for Inference.

The training looks good to me. I’m using a batch size of 4 samples, 100 epochs of training and have 880 imgs to train on. The training loss goes down to 5e-4 and the validation loss around 0.001. (I would like to post a pictures but as a new user I can only post one image?!)

At the beginning of my evaluation I set the seed for the evaluation dataloader and the initial noise for the diffusion process. I observed that the quality of my output depends heavily on the initial noise (especially removing the noise in the background).

Here you have four diffused pictures with different initial noise. The top right picture is a good one and the two on the left are bad ones. When I use the same initial noise but with another image to inpaint, the quality stays the same.

.

This suprised me a lot. I think that I have a bug somewhere in my environment (e.g. that I seed the training procedure and therefore it only learns a narrow range of the noise distribution). But I already spent days searching for it and didn’t find anything…

Did you ever observed a similar behavior or do you have ideas what could be the problem? I’m starting to get stumped. Any thoughts are appreciated!

If you’re intersted, the code is on Github: Link.
Because I can’t tell what’s the problem I can’t show you a specific part of the code…
In the repository you can find different jupyter notebooks e.g. lesion_filling_unconditioned_repaint.ipynb. The core python code is inside the folder “custom_modules”. The relevant files are “Training.py”, “TrainingConditional”, “Evaluation2D.py”, “Evaluation2DFilling.py” and the “DDIMInpaintPipeline.py”.

Best regards,
Vinzenz

After some week I could solve it. The problem was that I just needed a longer training time. 1 day was not enought it needed around 3 days.

I thought the model has converged based on the loss progression. The thing is that the different timesteps have a different learning progression. While the larger timesteps (e.g. 980) have a smaller error (because they can learn almost the identity for noise prediction) the smaller timesteps (e.g. 30) have a bigger error. Looking at the loss function we can see spikes resulting from different timesteps. While the smaller timestep converged at some point the larger timesteps were still learning and have not generalized. If you’re interested I documented this in the report of my master thesis: Thesis_Diffusion_Lesions/08_Report/Report.pdf at main · vinzenzuhr/Thesis_Diffusion_Lesions · GitHub
Figure A.5 in the Appendix shows the loss of different timesteps.

Best,
Vinzenz

This topic was automatically closed 12 hours after the last reply. New replies are no longer allowed.