I have a problem running Adafactor using the trainer.
When writing my own training loop - all works well with Adafactor.
When using trainer, with constant learning rate - all works well.
When I try to use the Trainer with Adafactor - it prints that the learning rate at each step is 0, and naturally the training error is not decreased. Here’s what I do:
optimizer = Adafactor(model.parameters(), lr=0.001, eps=(1e-30, 1e-3), clip_threshold=1.0, decay_rate=-0.8,
beta1=None, weight_decay=0.0, scale_parameter=False, relative_step=False,
warmup_init=False)
lr_scheduler = AdafactorSchedule(optimizer)
training_args = TrainingArguments(
optim='adafactor',
...
)
trainer = Trainer(model=model,
args=training_args,
train_dataset=training_set,
eval_dataset=val_dataset,
tokenizer=tokenizer
optimizers=(optimizer, lr_scheduler),
)
What am I missing? Should the optimizer be passed to the optimizers param of the Trainer or to the TrainingArguments as optim? Or to both? This is a bit confusing.