Initializing the weights of the final layer of e.g. BertForTokenClassification with a manual seed

jwa018 · October 6, 2020, 1:37am

First of, I’m wondering how the final layer is initialized in the first place when I load my model using BertForTokenClassification.from_pretrained('bert-base-uncased')
Most of the model obviously loads the weights from pretraining, but where does the final layer, in this case the linear layer which takes in the hidden states for each token, get the weights? Is it a new random set of weights each time I load bert-base-uncased?

And is there a way for me to give a manual seed so that I get the same initialization for this final layer every time with this seed? Initialization of the final layer may have an effect on the results of fine tuning, so I would like to have control over it if I can, and compare how different initializations do.

Karthik12 · October 6, 2020, 7:22am

In this link, if you search for the “BertForTokenClassification” class, I see there is a call to the init_weights() function after the architecture is defined.

And to reproduce the results, have you tried setting the seed?

BramVanroy · October 6, 2020, 7:44am

BERT layers are initialized as follows:

github.com

huggingface/transformers/blob/d5d2744aa799b94488960a261d1b7376d791a621/src/transformers/modeling_bert.py#L592-L602


def _init_weights(self, module):
    """ Initialize the weights """
    if isinstance(module, (nn.Linear, nn.Embedding)):
        # Slightly different from the TF version which uses truncated_normal for initialization
        # cf https://github.com/pytorch/pytorch/pull/5617
        module.weight.data.normal_(mean=0.0, std=self.config.initializer_range)
    elif isinstance(module, nn.LayerNorm):
        module.bias.data.zero_()
        module.weight.data.fill_(1.0)
    if isinstance(module, nn.Linear) and module.bias is not None:
        module.bias.data.zero_()

You can set the seed before initializing the model, like so:

def set_seed(seed: Optional[int] = None):
    """Set all seeds to make results reproducible (deterministic mode).
       When seed is None, disables deterministic mode.
    :param seed: an integer to your choosing
    """
    if seed is not None:
        torch.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
        np.random.seed(seed)
        random.seed(seed)
        os.environ['PYTHONHASHSEED'] = str(seed)

Topic		Replies	Views
Why aren't all weights of BertForPreTraining initialized from the model checkpoint? Beginners	3	1588	October 5, 2021
Weights of pre-trained BERT model not initialized 🤗Transformers	2	2075	March 11, 2021
Adding linear layer to transformer model (+ save_pretrained and load_pretrained) 🤗Transformers	1	3707	March 10, 2022
Getting random results with BERT 🤗Transformers	3	914	April 27, 2021
Metrics mismatch between BertForSequenceClassification Class and my custom Bert Classification Beginners	3	943	December 10, 2020

Initializing the weights of the final layer of e.g. BertForTokenClassification with a manual seed

Related topics