Just wondering; does the learning rate decay in the train_text_to_image.py
script? I’m resuming from a checkpoint quite a long way into training (360k steps) and I’m still seeing learning_rate=0.0001
in the progress indicator.
I always assumed that was just showing the initial learning rate, but is it supposed to reflect the actual learning rate at the current step (or epoch)? I’m only wondering because improvement in my outputs is extremely slow. I expected it to be slow, but it seems almost conspicuously slow, and if the learning rate isn’t decreasing (and is generally too high) that might explain it… it’s potentially bouncing around the minima.
The launch process does indicate: “All scheduler states loaded successfully”, so I’m assuming it’s resuming correctly from my checkpoint.
I’ve just started another run, for another 20 epochs, and added --learning_rate=1e-6
to my command, but the progress still indicates:
Steps: 5%|â–Ť | 11397/244100 [22:26<72:12:34, 1.12s/it, lr=0.0001, step_loss=0.0744]
Any clarification appreciated.
UPDATE: Digging in a bit more, comet_ml
is showing the learning rate as 1e-6
, which is what I set… so who’s right? It does also show lr_scheduler
as “constant”, which surprised me. Does diffusion use a constant learning rate? Is that the best approach?