Trainer Ignoring Weight Decay, Beta arguments

Hello, this is probably an easy one to answer.

Trainer seems to be ignoring my weight decay and adam_beta arguments. I will be training a model for many epochs that consist of many steps, and I want to slow down the rate at which the learning rate falls off so I don’t end up with virtually 0 learn rate after a few epochs.

Specifying weight_decay=0, and increasing adam_beta1 and adam_beta2 do not seem to do anything to the magnitude of learning rate decay.

Am I missing something here? Or is this intended behavior? The weight decay also seems to be altered on a by step basis. I don’t see any arguments that allow me to change the “weight decay strategy”. Is the only way to alter that by subclassing Trainer and overriding the scheduler?

It turns out it wasn’t ignoring my arguments, but that I was trying to solve the wrong problem.

I needed to leave the Adam arguments alone. Even with 0 weight decay, there is a learn rate scheduler that defaults to a linear rate decay. You need to override the Trainer Class and create a custom scheduler as seen in other forum posts, and you can adjust power to be lower for slower decrease in LR:

class CustomTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)

    def create_optimizer_and_scheduler(self, num_training_steps):
        self.optimizer = torch.optim.AdamW(self.model.parameters(),
                               lr=self.args.learning_rate,
                               weight_decay=self.args.weight_decay)
        self.lr_scheduler = get_polynomial_decay_schedule_with_warmup(
            self.optimizer, 0, num_training_steps, power=0.5)
 
trainer = CustomTrainer(
   model=model,
   args=training_args,
   train_dataset=input_ds['train'],
   eval_dataset=input_ds['test'],
   tokenizer=tokenizer,
   compute_metrics=compute_metrics
)
1 Like