- Python version: 3.7
I am implementing a paper that I read based on the Question Answering code “run_qa.py” on huggingface.
I added a few layer in the ELECTRA, and I trained and saved only the parameters for the added layer.
when I evaluate, I load that parameters and the rest were initialized by parameters of the pre-trained ELECTRA model.
def load_cda_qa_model(args, phase, checkpoint=None): # assert phase == 'train' or phase == 'eval' config = CONFIG_CLASSES[args.model_type].from_pretrained(args.model_name_or_path) model = MODEL_FOR_QUESTION_ANSWERING[args.model_type].from_pretrained(checkpoint) tmp_electra = MODEL_FOR_QUESTION_ANSWERING['electra'].from_pretrained(args.model_name_or_path, config=config) electra_state_dict = tmp_electra.state_dict() model_state_dict = model.state_dict() for electra_key, electra_value in electra_state_dict.items(): model_state_dict[electra_key] = electra_value model.load_state_dict(model_state_dict) return model
the results of two cases are:
What I want to ask here is why the results change when the order of writing in the red and yellow parts seems to be no difference in code flow.