T5 Finetuning Tips

ndvb · February 14, 2023, 11:09am

I have a problem running Adafactor using the trainer.
When writing my own training loop - all works well with Adafactor.
When using trainer, with constant learning rate - all works well.
When I try to use the Trainer with Adafactor - it prints that the learning rate at each step is 0, and naturally the training error is not decreased. Here’s what I do:


optimizer = Adafactor(model.parameters(), lr=0.001, eps=(1e-30, 1e-3), clip_threshold=1.0, decay_rate=-0.8,
						  beta1=None, weight_decay=0.0, scale_parameter=False, relative_step=False,
						  warmup_init=False)					  
						  
lr_scheduler = AdafactorSchedule(optimizer)  
training_args = TrainingArguments(
		optim='adafactor',
		...
		  )
										  
 trainer = Trainer(model=model,
		  args=training_args,
		  train_dataset=training_set,
		  eval_dataset=val_dataset,
		  tokenizer=tokenizer		  		  
		  optimizers=(optimizer, lr_scheduler),
		  )

What am I missing? Should the optimizer be passed to the optimizers param of the Trainer or to the TrainingArguments as optim? Or to both? This is a bit confusing.

Topic		Replies	Views
Issue with finetuning a seq-to-seq model 🤗Transformers	30	3966	August 11, 2022
mT5/T5v1.1 Fine-Tuning Results Models	16	7512	March 8, 2022
Training T5 on mlm task from scratch 🤗Transformers	4	3284	July 29, 2022
How to fine-tune T5-base model? Beginners	10	4600	July 28, 2021
HF Trainer: HF trainer cause a problem while fine-tuning T5 (T5 doesn't generate eos token at proper point) 🤗Transformers	0	826	March 6, 2022

T5 Finetuning Tips

Related topics