How to parameter efficient finetune Decoder in encoder-decoder model?

I have been trying to finetune the decoder by inserting an adapter after the ffn block of a large encoder-decoder model for the MT task, but the blue score is bad when compared to full finetuning.
Please, can anyone suggest a better method to finetune or refer me to some resources to finetune the decoder of a model

have you tried fixing it? i faced the same exact problem

yes, it was an issue with the initialization of layer norms in the network. I had to manually initialize the layer norms which fixed the issue

could you please share with me the notebook?

I am not using notebook but I can share the code to initialize new layernorm params

    for name, param in model.named_parameters():
             if "language_adaptor" in name and "norm" in name:
                if "weight" in name:  # Layer normalization scale
                    print("Initializing {} with mean=0 and std=1".format(name))
                    init.ones_(param.data)
                elif "bias" in name:  # Layer normalization bias
                    print("Initializing {} with mean=0".format(name))
                    init.zeros_(param.data)
1 Like