T5 Finetuning Tips

I have a problem running Adafactor using the trainer.
When writing my own training loop - all works well with Adafactor.
When using trainer, with constant learning rate - all works well.
When I try to use the Trainer with Adafactor - it prints that the learning rate at each step is 0, and naturally the training error is not decreased. Here’s what I do:


optimizer = Adafactor(model.parameters(), lr=0.001, eps=(1e-30, 1e-3), clip_threshold=1.0, decay_rate=-0.8,
						  beta1=None, weight_decay=0.0, scale_parameter=False, relative_step=False,
						  warmup_init=False)					  
						  
lr_scheduler = AdafactorSchedule(optimizer)  
training_args = TrainingArguments(
		optim='adafactor',
		...
		  )
										  
 trainer = Trainer(model=model,
		  args=training_args,
		  train_dataset=training_set,
		  eval_dataset=val_dataset,
		  tokenizer=tokenizer		  		  
		  optimizers=(optimizer, lr_scheduler),
		  )

What am I missing? Should the optimizer be passed to the optimizers param of the Trainer or to the TrainingArguments as optim? Or to both? This is a bit confusing.

2 Likes