How to fine-tune "openai-gpt" model for sequence classification?


It is really embarrassing that the process of fine-tuning is still not clean, and let me share what I mean by that.

If we follow the tutorial to fine-tune BERT in this link (Fine-tune a pretrained model) to fine-tune “openai-gpt”, we first get an error in the tokenization process that asks to add a pad token because the original tokenizer doesn’t have that. It’s not a big issue as I think add a pad token (tokenizer.pad_token = pad_token where pad_token = ‘[pad]’). All goes well with tokenization.

Here comes the crazy part. When I call the sequence classification model using model = AutoModelForSequenceClassification.from_pretrained(“openai-gpt”, num_labels=2) , I again get the error while training that there is no padding token! So again I add it as model.config.pad_token_id = tokenizer.pad_token_id but it seems like it doesn’t add an embedding when we add token like this!

Now here is my simple question : Since GPT models are autoregressive, I am not sure we really need [pad] tokens to learn isn’t it? If we really do, then is it too much work from HuggingFace community to provide a blog about these nuances in fine-tuning? Else all these blogs on a relatively easier case of fine-tuning BERT is of no (infact negative) use if someone has to sit for many hours trying to figure the small details.

Encountered the same problem recently, used model.config.pad_token_id = tokenizer.eos_token_id. Since we don’t want to compute loss on pad tokens, this was sufficient.

Actually tokenizer.eos_token_id is not defined as well so it doesn’t help.