this isn’t training from scratch, but from hacking around and experimenting with the EveryDream2 stable diffusion fine-tuner i was able to make a fairly useful and reliable loss graph by holding the noise seed fixed when running a (fixed-set/sequence) validation pass.
the intuition is that because the diffusion process relies on noise so heavily, variance in that noise between validation passes tends to overwhelm the relatively small signal of decreasing loss. to correct for this, re-seed the noise to the same seed every time you do a validation pass (i used isolate_rng() context manager to prevent also re-seeding the train RNG, iirc it’s in pytorch lightning). you’re still at the mercy of whatever sequence of noises that particular seed used for validation gives you, but you should find you have a loss curve that traces a more clearly decreasing trajectory (even if it’s just a small one).
fwiw, contra to the link @sayakpaul provided, this loss curve is informative - it pretty reliably indicates when fine-tuning loss has reached a minima, and can be trusted to start to trend upward in a way that’s reflective of the model overfitting the training data.
example: https://huggingface.co/damian0815/pashahlis-val-test-1e-6-ep30
i’m surprised no other stable diffusion fine-tuners have implemented this. also a bit suspicious…