I am pre-training RobertaMaskedForLM on my own custom dataset. I wanted to implement the layer-wise learning rate decay given in https://github.com/aws-health-ai/multi_domain_lm#learning-rate-control corresponding to the paper - An Empirical Investigation Towards Efficient Multi-Domain Language
Model Pre-training. Is there an easy way to incorporate this decay of learning rate with layer depth towards input using transformers.Trainer
?
I have the same question
There is nothing in the lib for this, but you can pass your own optimizer and scheduler.
Hello, I have the same question. I’m fine-tuning RoBERTa large for RE(Relation Extraction) task and the paper I referenced used layer decay.
It seems like I have to custom my own optimizer and scheduler for layer-wise learning rate decay. Could you tell me how you implemented your own scheduler?