Regularisation using Bert

I am seeking to predict nationality and sentiment from tweets and I use this code to set the dropout rate. I have tested this code using dropout rates of 0.1, 0.3 and 0.5 but I find the problem with overfitting is not being fixed. I am not sure whether to include the dropout rate for attention, so I commented out that line.

def load_bert_model():
    # Load BertForSequenceClassification, the pretrained BERT model with a single 
    # linear classification layer on top. 
    Helper.printline(f'Loading {Hyper.model_name_short} model using {Hyper.model_name} ...')
    _config = set_dropout()
    model = BertForSequenceClassification.from_pretrained(
        Hyper.model_name,               # Use the 12-layer BERT model, with an uncased vocab.
        config = _config

    return model

def set_dropout():
    config = BertConfig()
    config.num_labels = Hyper.num_labels    # Labels are either positive or negative sentiment, and country.
    #config.attention_probs_dropout_prob = Hyper.dropout_rate
    config.hidden_dropout_prob = Hyper.dropout_rate
    config.output_attentions = False        # Do not return attentions weights.
    config.output_hidden_states = False     # Do not return all hidden-states.
    return config

Because the training loss per epoch is decreasing I am thinking it must be possible to improve the testing loss as well, but I am not sure what to try next? I am using 4 epochs, should I stick to one?