Learning rate for further pretraining BERT on masked language modeling task

Gianluca September 16, 2021, 8:41pm 1

I want to further pretrain BERT on my corpus. Is there a standard or typical value for the learning rate that is used when training on the masked language modeling task?

Topic		Replies	Views
Fine-tune BERT for Masked Language Modeling 🤗Transformers	3	3051	January 25, 2021
Bert LM pretraining: training loss goes to 0 at masking probability of 0.999 Beginners	2	2345	October 31, 2020
How can I see the masked words during pre-learning by MLM? 🤗Transformers	0	254	February 7, 2022
Are the weights of the maskedLM head of the `BertForMaskedLM` model pre-trained? 🤗Transformers	0	419	October 19, 2020
Learning rate setting 🤗Transformers	1	2082	November 16, 2020

Learning rate for further pretraining BERT on masked language modeling task

Related topics