Learning rate schedule with `train_text_to_image.py`

jbmaxwell · March 12, 2023, 3:20am

Just wondering; does the learning rate decay in the train_text_to_image.py script? I’m resuming from a checkpoint quite a long way into training (360k steps) and I’m still seeing learning_rate=0.0001 in the progress indicator.

I always assumed that was just showing the initial learning rate, but is it supposed to reflect the actual learning rate at the current step (or epoch)? I’m only wondering because improvement in my outputs is extremely slow. I expected it to be slow, but it seems almost conspicuously slow, and if the learning rate isn’t decreasing (and is generally too high) that might explain it… it’s potentially bouncing around the minima.

The launch process does indicate: “All scheduler states loaded successfully”, so I’m assuming it’s resuming correctly from my checkpoint.

I’ve just started another run, for another 20 epochs, and added --learning_rate=1e-6 to my command, but the progress still indicates:

Steps: 5%|▍ | 11397/244100 [22:26<72:12:34, 1.12s/it, lr=0.0001, step_loss=0.0744]

Any clarification appreciated.

UPDATE: Digging in a bit more, comet_ml is showing the learning rate as 1e-6, which is what I set… so who’s right? It does also show lr_scheduler as “constant”, which surprised me. Does diffusion use a constant learning rate? Is that the best approach?

Topic		Replies	Views
Learning rate zero? 🧨 Diffusers	1	762	March 31, 2023
Resume_from_checkpoint does not configure learning rate scheduler correctly DeepSpeed	3	964	November 28, 2023
Resume in train_unconditional.py 🧨 Diffusers	4	790	November 11, 2022
Continue fine-tuning with Trainer() after completing the initial training process Beginners	9	5676	January 19, 2022
How to check or manually control the learning rate used in training? 🤗Transformers	1	8095	May 6, 2022

Learning rate schedule with `train_text_to_image.py`

Related topics