I am seeking to predict nationality and sentiment from tweets and I use this code to set the dropout rate. I have tested this code using dropout rates of 0.1, 0.3 and 0.5 but I find the problem with overfitting is not being fixed. I am not sure whether to include the dropout rate for attention, so I commented out that line.
def load_bert_model():
# Load BertForSequenceClassification, the pretrained BERT model with a single
# linear classification layer on top.
Helper.printline(f'Loading {Hyper.model_name_short} model using {Hyper.model_name} ...')
_config = set_dropout()
model = BertForSequenceClassification.from_pretrained(
Hyper.model_name, # Use the 12-layer BERT model, with an uncased vocab.
config = _config
)
return model
def set_dropout():
config = BertConfig()
config.num_labels = Hyper.num_labels # Labels are either positive or negative sentiment, and country.
#config.attention_probs_dropout_prob = Hyper.dropout_rate
config.hidden_dropout_prob = Hyper.dropout_rate
config.output_attentions = False # Do not return attentions weights.
config.output_hidden_states = False # Do not return all hidden-states.
return config
Because the training loss per epoch is decreasing I am thinking it must be possible to improve the testing loss as well, but I am not sure what to try next? I am using 4 epochs, should I stick to one?