Hi,
It is really embarrassing that the process of fine-tuning is still not clean, and let me share what I mean by that.
If we follow the tutorial to fine-tune BERT in this link (Fine-tune a pretrained model) to fine-tune âopenai-gptâ, we first get an error in the tokenization process that asks to add a pad token because the original tokenizer doesnât have that. Itâs not a big issue as I think add a pad token (tokenizer.pad_token = pad_token where pad_token = â[pad]â). All goes well with tokenization.
Here comes the crazy part. When I call the sequence classification model using model = AutoModelForSequenceClassification.from_pretrained(âopenai-gptâ, num_labels=2) , I again get the error while training that there is no padding token! So again I add it as model.config.pad_token_id = tokenizer.pad_token_id but it seems like it doesnât add an embedding when we add token like this!
Now here is my simple question : Since GPT models are autoregressive, I am not sure we really need [pad] tokens to learn isnât it? If we really do, then is it too much work from HuggingFace community to provide a blog about these nuances in fine-tuning? Else all these blogs on a relatively easier case of fine-tuning BERT is of no (infact negative) use if someone has to sit for many hours trying to figure the small details.