Regularisation using Bert

arame3333 · November 16, 2021, 4:17pm

I am seeking to predict nationality and sentiment from tweets and I use this code to set the dropout rate. I have tested this code using dropout rates of 0.1, 0.3 and 0.5 but I find the problem with overfitting is not being fixed. I am not sure whether to include the dropout rate for attention, so I commented out that line.

def load_bert_model():
    # Load BertForSequenceClassification, the pretrained BERT model with a single 
    # linear classification layer on top. 
    Helper.printline(f'Loading {Hyper.model_name_short} model using {Hyper.model_name} ...')
    _config = set_dropout()
    model = BertForSequenceClassification.from_pretrained(
        Hyper.model_name,               # Use the 12-layer BERT model, with an uncased vocab.
        config = _config
    )

    return model

def set_dropout():
    config = BertConfig()
    config.num_labels = Hyper.num_labels    # Labels are either positive or negative sentiment, and country.
    #config.attention_probs_dropout_prob = Hyper.dropout_rate
    config.hidden_dropout_prob = Hyper.dropout_rate
    config.output_attentions = False        # Do not return attentions weights.
    config.output_hidden_states = False     # Do not return all hidden-states.
    return config

Because the training loss per epoch is decreasing I am thinking it must be possible to improve the testing loss as well, but I am not sure what to try next? I am using 4 epochs, should I stick to one?

Topic		Replies	Views
Questions about my first code on fine-tuning BERT model for text-classification Beginners	0	1510	April 26, 2022
Overfitting in BERT IMDB50k 🤗Transformers	0	1097	June 3, 2021
Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification Beginners	3	946	December 10, 2020
Loss behaviour for bert fine-tuning on QNLI Models	3	4432	October 15, 2021
Training loss is not decreasing using TFBertModel 🤗Transformers	4	5762	October 24, 2023

Regularisation using Bert

Related topics