Question about GPT2LMHeadModel, GPT2ForSequenceClassification

Hi, I have some questions about GPT2 models of Huggingface.

  1. In order to generate a sentence autoregressively, should I use GPT2LMHeadModel, not GPT2Model? (maybe because there is no LM head on GPT2Model?)

  2. Does GPT2ForSequenceClassification also have LM head? or just classification at the end?
    if also has LM head? Can I also use GPT2ForSequenceClassification for generating a sentence?
    How GPT2ForSequenceClassification is pretrained? Were there 2 learning objective, one is LM loss
    and the other is classfication loss?

Thank for your answer in advance!

=> you should use GPT2LMHeadModel, since this adds the language modeling head on top of GPT2Model. To do language modeling, you need a language modeling head on top of the base model.

GPT2ForSequenceClassification doesn’t have a language modeling head. Instead, it just uses a classification head. It will use the last token in order to do the classification, as other causal models
(e.g. GPT-1) do.

So basically, let’s say you have the sentence “hello world”, and it gets tokenized (using GPT2Tokenizer) into [“he”, “llo”, “world”], then you first forward the tokens through the base model to get final hidden states for each of the tokens, and then you place your classifier head on top of the final hidden state of the last token (in this case, the token “world”). This final hidden state is a vector (for instance of size 768, assuming you’re using a GPT-2 base-sized model), and the classifier turns it into a vector of size 2 (assuming you have 2 labels).

No, for generating language you’ll have to use GPT2LMHeadModel.

GPT2 was pre-trained to predict the next token on a very large corpus of text. So basically GPT2LMHeadModel was used for pre-training the model. If you then want to use GPT-2 for sequence classification, you can throw away the language modeling head and place a (randomly initialized) sequence classifier head on top instead: that’s what GPT2ForSequenceClassification is doing.

Alternatively, you can treat sequence classification as a language modeling problem, by “prompting” the model (that’s how GPT-3 and friends also treat sequence classification nowadays). You can for instance train the model to generate the class after your prompt, like:

“classify: I really liked this movie”

=> and then GPT2LMHeadModel should generate “positive”.

@nielsr , Thank you very much for your detailed answer!

Now I know what to do. Since my model has to generate sentence and also sequence classification, I will use GPT2LMHeadModel by attaching classification head on top of a special token that I’m going to append after the last token of a sentence.