Non determinism model loading

asdsfadsfwefwfz · April 26, 2023, 2:56am

hi all,

I want to do hyper parameter tuning and reload my model in a loop. I have realized that if I load the model subsequently like below, it is not the same model that is loaded after calling it the second time the weights are differently initialized. however, in each execution the first one is always the same model and the subsequent ones are also the same, but the first one is always != the second one and so on.
I’m going crazy. what is going on here?

model = BertForSequenceClassification.from_pretrained(MODEL, num_labels=len(label2id), id2label=id2label, label2id=label2id,
                                                            output_attentions=False, output_hidden_states=False)
model.save_pretrained('./model1/')

model = BertForSequenceClassification.from_pretrained(MODEL, num_labels=len(label2id), id2label=id2label, label2id=label2id,
                                                            output_attentions=False, output_hidden_states=False)
model.save_pretrained('./model2/')

If I run this, the weights are not the same.

import torch

model1 = BertForSequenceClassification.from_pretrained('./model1/')
model2 = BertForSequenceClassification.from_pretrained('./model2/')

for p1, p2 in zip(model1.parameters(), model2.parameters()):
    if not torch.allclose(p1, p2):
        print("Weights are not the same.")
        break
else:
    print("Weights are the same.")

I’ve set every imaginable thing to be deterministic:

random.seed(seed_val)
np.random.seed(seed_val)
torch.manual_seed(seed_val)
torch.cuda.manual_seed_all(seed_val)
os.environ["CUBLAS_WORKSPACE_CONFIG"] = ":4096:8"
torch.use_deterministic_algorithms(True)
torch.backends.cudnn.deterministic = True
torch.backends.cudnn.benchmark = False

any help is appreciated!

Dusduo · February 11, 2024, 4:50am

This post is kind of old, but as I was asking myself the same question I thought that replying now could maybe serve others later.

I believe the reason why the reason why the weights are different after subsequent initialization is because the ‘transformers’ implementation follows the same fine-tuning procedure as hinted in the original BERT paper: using same pre-trained checkpoint but different classifier layer initialization.
Thus, here you were probably loading BertForSequenceClassification using a model trained for masked language modeling (e.g. bert-base-cased), therefore only the weights corresponding to the BERTModel part of the model were loaded, and the classification head was randomly initialized. (You probably had a warning telling you about it while loading the model.

Regarding the fact that ‘in each execution the first one is always the same model and the subsequent ones are also the same’, my best guest is that the implementation is generating the random seed in an iterative process (did not check it thought).

Hope this helps!

Topic		Replies	Views
Initializing the weights of the final layer of e.g. BertForTokenClassification with a manual seed 🤗Transformers	2	8012	October 6, 2020
Loading pytorch_pretrained_bert models with transformers Beginners	2	1913	April 29, 2021
Multiple training will give exactly the same result except for the first time 🤗Transformers	1	3588	July 19, 2021
Getting random results with BERT 🤗Transformers	3	922	April 27, 2021
Model loading and saving seems to change the model file 🤗Transformers	0	393	April 22, 2021

Non determinism model loading

Related topics