Initializing the weights of the final layer of e.g. BertForTokenClassification with a manual seed

First of, I’m wondering how the final layer is initialized in the first place when I load my model using BertForTokenClassification.from_pretrained('bert-base-uncased')
Most of the model obviously loads the weights from pretraining, but where does the final layer, in this case the linear layer which takes in the hidden states for each token, get the weights? Is it a new random set of weights each time I load bert-base-uncased?

And is there a way for me to give a manual seed so that I get the same initialization for this final layer every time with this seed? Initialization of the final layer may have an effect on the results of fine tuning, so I would like to have control over it if I can, and compare how different initializations do.

1 Like

In this link, if you search for the “BertForTokenClassification” class, I see there is a call to the init_weights() function after the architecture is defined.

And to reproduce the results, have you tried setting the seed?

BERT layers are initialized as follows:

You can set the seed before initializing the model, like so:

def set_seed(seed: Optional[int] = None):
    """Set all seeds to make results reproducible (deterministic mode).
       When seed is None, disables deterministic mode.
    :param seed: an integer to your choosing
    """
    if seed is not None:
        torch.manual_seed(seed)
        torch.cuda.manual_seed_all(seed)
        torch.backends.cudnn.deterministic = True
        torch.backends.cudnn.benchmark = False
        np.random.seed(seed)
        random.seed(seed)
        os.environ['PYTHONHASHSEED'] = str(seed)

3 Likes