Bert model on Acceptability Judgement Task || Optimizer Grouped Parameters

I have just started learning about Transformers and the BERT model.
I refered to this notebook : ,
and I got stuck here :

In [41]:
##@title Optimizer Grouped Parameters
#This code is taken from:

# Don't apply weight decay to any parameters whose names include these tokens.
# (Here, the BERT doesn't have `gamma` or `beta` parameters, only `bias` terms)
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'LayerNorm.weight']
# Separate the `weight` parameters from the `bias` parameters. 
# - For the `weight` parameters, this specifies a 'weight_decay_rate' of 0.01. 
# - For the `bias` parameters, the 'weight_decay_rate' is 0.0. 
optimizer_grouped_parameters = [
    # Filter for all parameters which *don't* include 'bias', 'gamma', 'beta'.
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.1},
    # Filter for parameters which *do* include those.
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.0}
# Note - `optimizer_grouped_parameters` only includes the parameter values, not 
# the names.

Can anyone please explain about how this weight decay and parameters are affecting the model or maybe give a reference to learn about it.