Is there an easy way to apply layer-wise decaying learning rate in huggingface trainer for RobertaMaskedForLM?

I am pre-training RobertaMaskedForLM on my own custom dataset. I wanted to implement the layer-wise learning rate decay given in https://github.com/aws-health-ai/multi_domain_lm#learning-rate-control corresponding to the paper - An Empirical Investigation Towards Efficient Multi-Domain Language
Model Pre-training
. Is there an easy way to incorporate this decay of learning rate with layer depth towards input using transformers.Trainer?

I have the same question

There is nothing in the lib for this, but you can pass your own optimizer and scheduler.