Is there an easy way to apply layer-wise decaying learning rate in huggingface trainer for RobertaMaskedForLM?

abhi1nandy2 · October 17, 2020, 9:31am

I am pre-training RobertaMaskedForLM on my own custom dataset. I wanted to implement the layer-wise learning rate decay given in https://github.com/aws-health-ai/multi_domain_lm#learning-rate-control corresponding to the paper - An Empirical Investigation Towards Efficient Multi-Domain Language
Model Pre-training. Is there an easy way to incorporate this decay of learning rate with layer depth towards input using transformers.Trainer?

guoziyuan · November 14, 2020, 4:14am

I have the same question

sgugger · November 16, 2020, 1:57pm

There is nothing in the lib for this, but you can pass your own optimizer and scheduler.

HanNayeoniee · April 5, 2022, 9:01am

Hello, I have the same question. I’m fine-tuning RoBERTa large for RE(Relation Extraction) task and the paper I referenced used layer decay.
It seems like I have to custom my own optimizer and scheduler for layer-wise learning rate decay. Could you tell me how you implemented your own scheduler?

Topic		Replies	Views
Linear Learning Rate Warmup with step-decay Beginners	4	3263	April 21, 2021
How to create the warmup and decay from the BERT/Roberta papers? 🤗Transformers	2	7399	November 18, 2020
Smaller RoBERTa model Beginners	1	822	July 10, 2020
Increasing validation loss even with small learning rate - RoBERTa Models	0	1125	March 1, 2021
Fine-tune MLM in Roberta custom loss (additional component) Beginners	4	345	March 20, 2024

Is there an easy way to apply layer-wise decaying learning rate in huggingface trainer for RobertaMaskedForLM?

Related topics