Is there any reason why GPT-Neo would behave differently (fundamentally) from GPT2?

sp4912 · January 15, 2023, 3:20am

Hey guys. I’m running some experiments as part of a research project; it was initially implemented for GPT-Neo 1.3B, but there is one baseline we want to use that only supports GPT2-XL, so I implemented that into our code (i.e., just included a clause that was like “if model_name=‘gpt2’, model=GPT2LMHeadModel.frompretrained(‘gpt2-xl’)”). Both of these are of course from huggingface.

The issue is, GPT2 is giving absolutely absurd results that are clearly very incorrect. It is difficult to explain this without an in-depth explanation of my code, but basically I am doing a multiple-choice test where the model is rewarded for assigning the highest probability to the correct label; GPT2 assigns the exact same probability to all but one of the labels, and assigns a very small probability to that one (which usually isn’t even the right answer anyway). Furthermore, fine-tuning on this dataset has literally no impact on the results, which is bizarre.

So my question is, is there any fundamental difference in how these two models are setup in hugging face, that would result in such errors? I.e., theoretically, is there anything that I have to do to change the code to fit GPT2 better, besides the initial “model=GPT2… statement”? I myself am not too familiar with hugging face models, so I’m not entirely sure. But the fact that the code runs but produces bad errors is weird; I would think that if something was wrong, there would be some sort of tensor-size doesn’t match error somewhere…

Topic		Replies	Views
Help converting model weights from polycoder gpt-neox 🤗Transformers	1	440	August 11, 2022
Getting started with GPT2 Beginners	1	513	December 26, 2021
Can't Replicate GPT-2 Output Detector Demo Results Beginners	0	722	August 23, 2022
Huggingface Transformers; Polyglot-12.8b (GPT-Neox); You may consider adding ignore_mismatched_sizes=True in the model from_pretrained Beginners	0	862	August 21, 2023
Missing vocab in gpt2 model? Beginners	1	282	September 14, 2021

Is there any reason why GPT-Neo would behave differently (fundamentally) from GPT2?

Related topics