Bert model on Acceptability Judgement Task || Optimizer Grouped Parameters

JamesBond · September 11, 2021, 7:47pm

I have just started learning about Transformers and the BERT model.
I refered to this notebook : https://github.com/PacktPublishing/Transformers-for-Natural-LanguageProcessing/blob/main/Chapter02/BERT_Fine_Tuning_Sentence_Classification_DR.ipynb ,
and I got stuck here :

In [41]:
##@title Optimizer Grouped Parameters
#This code is taken from:
# https://github.com/huggingface/transformers/blob/5bfcd0485ece086ebcbed2d008813037968a9e58/examples/run_glue.py#L102

# Don't apply weight decay to any parameters whose names include these tokens.
# (Here, the BERT doesn't have `gamma` or `beta` parameters, only `bias` terms)
param_optimizer = list(model.named_parameters())
no_decay = ['bias', 'LayerNorm.weight']
# Separate the `weight` parameters from the `bias` parameters. 
# - For the `weight` parameters, this specifies a 'weight_decay_rate' of 0.01. 
# - For the `bias` parameters, the 'weight_decay_rate' is 0.0. 
optimizer_grouped_parameters = [
    # Filter for all parameters which *don't* include 'bias', 'gamma', 'beta'.
    {'params': [p for n, p in param_optimizer if not any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.1},
    
    # Filter for parameters which *do* include those.
    {'params': [p for n, p in param_optimizer if any(nd in n for nd in no_decay)],
     'weight_decay_rate': 0.0}
]
# Note - `optimizer_grouped_parameters` only includes the parameter values, not 
# the names.

Can anyone please explain about how this weight decay and parameters are affecting the model or maybe give a reference to learn about it.

Topic		Replies	Views
How to freeze BERT weights Beginners	0	967	October 28, 2021
Unable to train a good model after using exclude_from_weight_decay Intermediate	0	401	October 19, 2021
Parameter groups and GPT2 LayerNorm 🤗Transformers	3	648	March 9, 2021
AdamW Pytorch vs Huggingface 🤗Transformers	0	1388	January 27, 2023
How to separate the parameters of a transformer into groups? 🤗Transformers	0	271	April 23, 2021

Bert model on Acceptability Judgement Task || Optimizer Grouped Parameters

Related topics