First of, I’m wondering how the final layer is initialized in the first place when I load my model using BertForTokenClassification.from_pretrained('bert-base-uncased')
Most of the model obviously loads the weights from pretraining, but where does the final layer, in this case the linear layer which takes in the hidden states for each token, get the weights? Is it a new random set of weights each time I load bert-base-uncased?
And is there a way for me to give a manual seed so that I get the same initialization for this final layer every time with this seed? Initialization of the final layer may have an effect on the results of fine tuning, so I would like to have control over it if I can, and compare how different initializations do.
You can set the seed before initializing the model, like so:
def set_seed(seed: Optional[int] = None):
"""Set all seeds to make results reproducible (deterministic mode).
When seed is None, disables deterministic mode.
:param seed: an integer to your choosing
if seed is not None:
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False
os.environ['PYTHONHASHSEED'] = str(seed)