How to parameter efficient finetune Decoder in encoder-decoder model?

Abhishekkgupta · April 15, 2024, 3:13pm

I have been trying to finetune the decoder by inserting an adapter after the ffn block of a large encoder-decoder model for the MT task, but the blue score is bad when compared to full finetuning.
Please, can anyone suggest a better method to finetune or refer me to some resources to finetune the decoder of a model

yasmineee · July 27, 2024, 7:45pm

have you tried fixing it? i faced the same exact problem

Abhishekkgupta · July 27, 2024, 7:57pm

yes, it was an issue with the initialization of layer norms in the network. I had to manually initialize the layer norms which fixed the issue

yasmineee · July 27, 2024, 8:00pm

could you please share with me the notebook?

Abhishekkgupta · July 27, 2024, 8:12pm

I am not using notebook but I can share the code to initialize new layernorm params

    for name, param in model.named_parameters():
             if "language_adaptor" in name and "norm" in name:
                if "weight" in name:  # Layer normalization scale
                    print("Initializing {} with mean=0 and std=1".format(name))
                    init.ones_(param.data)
                elif "bias" in name:  # Layer normalization bias
                    print("Initializing {} with mean=0".format(name))
                    init.zeros_(param.data)

Topic		Replies	Views
How to parameter efficient finetune Decoder in encoder-decoder model Beginners	1	215	April 23, 2024
Problem with Adding LayerNorm after BART's Encoder for Summarization 🤗Transformers	0	391	May 16, 2022
About finetuning whisper 🤗Transformers	0	207	May 5, 2023
Use BertLMHeadModel to finetunning a language model 🤗Transformers	0	324	March 30, 2021
Partially fine-tuning an encoder in an encoder-decoder transformer 🤗Accelerate	0	1281	August 17, 2021

How to parameter efficient finetune Decoder in encoder-decoder model?

Related topics