How do use lr_scheduler

How to use lr_scehuler in Trainer? it seems that whenever I pass AdamW optimizer, it also need the dictionary of params to tune. Since I am using just plain Trainer (not being intimate with PyTorch) The parameters are not exposed to pass to AdamW yielding an error.

Does anyone have an idea of how I can do that?

Hi @Neel-Gupta, you’ll need to create a custom trainer by subclassing Trainer and overriding the create_optimizer_and_scheduler function (see here for the source code):

class MyAwesomeTrainer(Trainer):
    def __init__(self, *args, **kwargs):
        super().__init__(*args, **kwargs)
        # Add custom attributes here
    def create_optimizer_and_scheduler(self, num_training_steps):

Assuming that you’re trying to learn some custom parameters, the idea is to add a dict like

{"params": [p for n, p in self.model.named_parameters()  if "name_of_custom_params" in n and p.requires_grad], "lr": self.args.custom_params_lr}

to the optimizer_grouped_parameters list you can see in the source code. Then you can add the remaining bits with something like the following:

def create_optimizer_and_scheduler(self, num_training_steps: int):
    no_decay = ["bias", "LayerNorm.weight"]
    # Add any new parameters to optimize for here as a new dict in the list of dicts
    optimizer_grouped_parameters = ...

    self.optimizer = AdamW(optimizer_grouped_parameters, 
    self.lr_scheduler = get_linear_schedule_with_warmup(
        self.optimizer, num_warmup_steps=self.args.warmup_steps, 

Does that make sense?

That seems pretty complicated :sweat_smile: I would probably work on this. Thanx a ton for your help!! :+1:


Haha, well at least you don’t have to implement all the other parts of the training loop :slight_smile:

What are you trying to do exactly with the lr scheduler?

I noticed that in the normal available warmup_steps and weight_decay, after quite some steps apparently there might be some misconfiguration of the loss as after being stable and increasing slowly for quite some epochs, it suddenly explodes.

I had the problem before when using Native Tensorflow and had fixed it by applying the scheduler and getting a better accuracy faster and some custom callbacks in TF.

Ah in that case can’t you just configure warmup_steps and weight_decay directly in the TrainingArguments?

You can also change the scheduler type in case that’s what you’re after: transformers.trainer_utils — transformers 4.3.0 documentation

Finally, you can also implement custom callbacks in transformers - see here: Callbacks — transformers 4.3.0 documentation