Learning rate zero?

Hi All,

I’ve just noticed something pretty horrible… I just stopped a fairly long run because I noticed that the learning rate was being reported as zero. (??) I went to comet.com to check the state of the run and it clearly shows that the hyperparameter is set to 1.0E-5, which is also indicated in my command line history as well. I did set a linear learning rate schedule, but I don’t see why that would do this…

Is it possible the scheduler is somehow corrupted, or that there’s something in there that has messed with the overall schedule?

This is the scheduler_config.json (though I think this is pretty irrelevant):

  "_class_name": "PNDMScheduler",
  "_diffusers_version": "0.15.0.dev0",
  "beta_end": 0.012,
  "beta_schedule": "scaled_linear",
  "beta_start": 0.00085,
  "clip_sample": false,
  "num_train_timesteps": 1000,
  "prediction_type": "epsilon",
  "set_alpha_to_one": false,
  "skip_prk_steps": true,
  "steps_offset": 1,
  "trained_betas": null

Is there anywhere I can check to find source of this error/behaviour?

I should note that model clearly did make some progress, though it might have been in a previous run. I’ve resumed a couple of times at this point.

The last log entry I can see with a non-zero lr shows 9.28e-7.