T5 training with Trainer, w/ AdaFactor

All is well when I have my own training loop, however when I try to move to using Trainer - the loss doesn’t decrease.

Every step, it prints something like:
{‘loss’: 11.3911, ‘learning_rate’: 0.0, ‘epoch’: 0.01}

Now, I am using Adafactor. I do:

optimizer = Adafactor(model.parameters(), lr=0.001, eps=(1e-30, 1e-3), clip_threshold=1.0, decay_rate=-0.8,
						  beta1=None, weight_decay=0.0, scale_parameter=False, relative_step=False,

Then I don’t specify the optimizer to the TrainingArguments (because I am specifying it later in the Trainer, hope I am not wrong here), and then in the Trainer() I pass:

optimizers=(optimizer, lr_scheduler)

1 Like