Is there an easy way to apply layer-wise decaying learning rate in huggingface trainer for RobertaMaskedForLM?

I am pre-training RobertaMaskedForLM on my own custom dataset. I wanted to implement the layer-wise learning rate decay given in https://github.com/aws-health-ai/multi_domain_lm#learning-rate-control corresponding to the paper - An Empirical Investigation Towards Efficient Multi-Domain Language
Model Pre-training
. Is there an easy way to incorporate this decay of learning rate with layer depth towards input using transformers.Trainer?

I have the same question

There is nothing in the lib for this, but you can pass your own optimizer and scheduler.

Hello, I have the same question. I’m fine-tuning RoBERTa large for RE(Relation Extraction) task and the paper I referenced used layer decay.
It seems like I have to custom my own optimizer and scheduler for layer-wise learning rate decay. Could you tell me how you implemented your own scheduler?