Importance of ignoring special tokens in loss function

shantanuacharya · December 1, 2022, 9:46pm

I’m trying to train a video captioning model where I train only my image encoder and keep the text generator frozen. I am using the pre-trained GPT2LMHeadModel as the text generator where I feed it directly the video embeddings as input (via inputs_embeds).

For the tokenizer, I see that GPT2Tokenizer uses the same token id for BOS, EOS, and PAD tokens.

If I want my loss function to ignore the pad token, then it will also ignore the BOS and EOS tokens. Is this fine? Or do I need to assign the pad token a different id which would then require me to finetune the text model as well since the embedding size would change?
If I want to use a separate tokenizer (which has a smaller vocab size), then will training just the LM head work (nn.Linear with vocab size as output dim)? Or in such scenarios do I need to finetune GPT2 as well?

Topic		Replies	Views
Should the padding token be ignored in the loss function? 🤗Transformers	0	1275	August 24, 2021
GPT2 finetuned with eos token will never yield eos token during generation Beginners	6	3359	April 12, 2024
GPT-2 special tokens Models	2	1974	February 20, 2024
Api and parameters change from transofrmers 2.5.1 to 3.5.1 for GPT2 🤗Transformers	0	239	January 4, 2021
How to fine-tune "openai-gpt" model for sequence classification? 🤗Transformers	3	1349	September 5, 2024

Importance of ignoring special tokens in loss function

Related topics