-
transformers
version: 4.4.2 - Python version: 3.7
I am implementing a paper that I read based on the Question Answering code “run_qa.py” on huggingface.
I added a few layer in the ELECTRA, and I trained and saved only the parameters for the added layer.
when I evaluate, I load that parameters and the rest were initialized by parameters of the pre-trained ELECTRA model.
def load_cda_qa_model(args, phase, checkpoint=None):
# assert phase == 'train' or phase == 'eval'
config = CONFIG_CLASSES[args.model_type].from_pretrained(args.model_name_or_path)
model = MODEL_FOR_QUESTION_ANSWERING[args.model_type].from_pretrained(checkpoint)
tmp_electra = MODEL_FOR_QUESTION_ANSWERING['electra'].from_pretrained(args.model_name_or_path, config=config)
electra_state_dict = tmp_electra.state_dict()
model_state_dict = model.state_dict()
for electra_key, electra_value in electra_state_dict.items():
model_state_dict[electra_key] = electra_value
model.load_state_dict(model_state_dict)
return model
the results of two cases are:
What I want to ask here is why the results change when the order of writing in the red and yellow parts seems to be no difference in code flow.