Weight decay rate in create optimizer tensorflow

Gozdi · April 6, 2022, 5:12pm

Hi how dose weight decay rate affect learning rate?
From the documentation i got that lr_schedule is created like that and then it is passed to AdamWeightDecay as learning_rate argument

lr_schedule = tf.keras.optimizers.schedules.PolynomialDecay(
        initial_learning_rate=init_lr,
        decay_steps=num_train_steps - num_warmup_steps,
        end_learning_rate=init_lr * min_lr_ratio,
    )

it doesn’t use weight decay rate and if I were to plot this schedule i would be just a straight line going from init_lr to end_lr

weight decay rate is used later in AdamWeightDecay class

def _decay_weights_op(self, var, learning_rate, apply_state):
        do_decay = self._do_use_weight_decay(var.name)
        if do_decay:
            return var.assign_sub(
                learning_rate * var * apply_state[(var.device, var.dtype.base_dtype)]["weight_decay_rate"],
                use_locking=self._use_locking,
            )
        return tf.no_op()

so is weight_decay_rate just another scalar that scales learning rate for particular training step??

for example if my lr_schedule created by tensorflow would look like this [0.1, 0.09, 0.08, 0.07, 0.06, 0.05, 0.04, 0.03, 0.02, 0.01]

and weight_decay_rate=0.01 then final learning rate would look like this [0.001, 0.0009, 0.0008, 0.0007, 0.0006, 0.0005, 0.0004, 0.0003, 0.0002, 0.0001]???
Am i getting this right?

Topic		Replies	Views
Trainer Ignoring Weight Decay, Beta arguments Beginners	1	893	July 28, 2023
Tensorboard support when using optimizer with 2 separate learning rates Intermediate	0	356	October 9, 2021
Optimizer returned by create_optimizer function Beginners	0	254	August 21, 2022
Linear Learning Rate Warmup with step-decay Beginners	4	3262	April 21, 2021
Trainer optimizer 🤗Transformers	11	8883	August 7, 2021

Weight decay rate in create optimizer tensorflow

Related topics