Increasing validation loss even with small learning rate - RoBERTa

I am using RoBERTa via Trainer.hyperparameter_search (transformers 3.5.x) to optimize a couple model params. For some reason, even at very small learning rates (e.g., 5e-6), my validation_loss increases epoch-over-epoch.

def hyperparameter_space(trial):
    return {
        "learning_rate": trial.suggest_float("learning_rate", 1e-6, 1e-3, log=True),
        "other_weight": trial.suggest_float("other_weight", 0.1, 1.0, log=False)

hp_search_output = my_trainer.hyperparameter_search(
    study_name = "2021-02-26_roberta_3_epochs",
    compute_objective=lambda x: x["eval_loss"],

Batch size is 16 with a total of ~80,000 input arrays of length 512.

Any suggestions for fixing/investigating this?

FWIW, this is similar to this unanswered question here: