Bug in BartForConditionalGeneration's intialisation of lm_head

nejf · October 16, 2021, 6:46pm

In bart’s modeling_utils.py there is the following piece of code. Now what this does is that BartForConditionalGeneration’s lm_head is tied to the token embeddings used by Bart’s enc/dec. This is wrong right? We just these two parameters to be initialised the same but not have the same value throughout.

    def _tie_or_clone_weights(self, output_embeddings, input_embeddings):
        """Tie or clone module weights depending of whether we are using TorchScript or not"""
        if self.config.torchscript:
            output_embeddings.weight = nn.Parameter(input_embeddings.weight.clone())
        else:
             ################################
             ### NOTE THE FOLLOWING LINE ###
             ################################
            output_embeddings.weight = input_embeddings.weight

        if getattr(output_embeddings, "bias", None) is not None:
            output_embeddings.bias.data = torch.nn.functional.pad(
                output_embeddings.bias.data,
                (
                    0,
                    output_embeddings.weight.shape[0] - output_embeddings.bias.shape[0],
                ),
                "constant",
                0,
            )
        if hasattr(output_embeddings, "out_features") and hasattr(input_embeddings, "num_embeddings"):
            output_embeddings.out_features = input_embeddings.num_embeddings

Topic		Replies	Views
Inheriting from BartForConditionalGeneration into a new class - weight not initializing Beginners	4	938	March 16, 2021
Mismatch of tensor shapes in CrossEntropyLoss for custom head layer in BART Beginners	0	266	January 30, 2023
Parameter lm_head returning none in tensorflow but works for pytorch Beginners	0	263	September 4, 2021
BartForConditionalGeneration "logits" shape is wrong/unexpected 🤗Transformers	4	919	November 11, 2020
What is the magic behind BartForConditionalGeneration？ Models	6	2558	March 30, 2021

Bug in BartForConditionalGeneration's intialisation of lm_head

Related topics