Learning rate for XLM-R followed by linear layers

cramraj8 · March 16, 2022, 4:51pm

When we design an encoder to have XLM-R base followed by a linear layer (or with some other parameter blocks), while training do we have to give different learning rates to XLM-R and rest of the model or same ? My XLM-R alone converges with 5e-6 learning rate. Should I give 1e-3 range values for rest of the model (excluding XLM-R) or they can still be trained with very low learning rate (5e-6) ?

Besides are there any modifications I have to do the entire encoder (XLM-R with additional randomly initialized layers) during training ?

Topic		Replies	Views
XLM in Encoder-Decoder settings Beginners	0	157	October 24, 2022
Effect of different sample rates while finetuning an XLSR ASR model Models	0	253	April 27, 2023
Wav2vec2 xlsr nan train loss Models	1	1007	June 14, 2021
xlm-Roberta for mlm doesn't predict single one trained sentence properly Models	0	218	June 29, 2023
Fine-tuning Decoder-only or Encoder-Decoder models for classification 🤗Transformers	0	691	July 17, 2024

Learning rate for XLM-R followed by linear layers

Related topics