How to use lr_scehuler in Trainer? it seems that whenever I pass AdamW optimizer, it also need the dictionary of params to tune. Since I am using just plain Trainer (not being intimate with PyTorch) The parameters are not exposed to pass to AdamW yielding an error.
Hi @Neel-Gupta, you’ll need to create a custom trainer by subclassing Trainer and overriding the create_optimizer_and_scheduler function (see here for the source code):
class MyAwesomeTrainer(Trainer):
def __init__(self, *args, **kwargs):
super().__init__(*args, **kwargs)
# Add custom attributes here
def create_optimizer_and_scheduler(self, num_training_steps):
pass
Assuming that you’re trying to learn some custom parameters, the idea is to add a dict like
{"params": [p for n, p in self.model.named_parameters() if "name_of_custom_params" in n and p.requires_grad], "lr": self.args.custom_params_lr}
to the optimizer_grouped_parameters list you can see in the source code. Then you can add the remaining bits with something like the following:
def create_optimizer_and_scheduler(self, num_training_steps: int):
no_decay = ["bias", "LayerNorm.weight"]
# Add any new parameters to optimize for here as a new dict in the list of dicts
optimizer_grouped_parameters = ...
self.optimizer = AdamW(optimizer_grouped_parameters,
lr=self.args.learning_rate,
eps=self.args.adam_epsilon)
self.lr_scheduler = get_linear_schedule_with_warmup(
self.optimizer, num_warmup_steps=self.args.warmup_steps,
num_training_steps=self.num_training_steps)
I noticed that in the normal available warmup_steps and weight_decay, after quite some steps apparently there might be some misconfiguration of the loss as after being stable and increasing slowly for quite some epochs, it suddenly explodes.
I had the problem before when using Native Tensorflow and had fixed it by applying the scheduler and getting a better accuracy faster and some custom callbacks in TF.