Padding to the left of the inputs, GPT2LMHeadModel gives different answer

wangkuiyi · February 14, 2023, 7:59pm

When I input the tokenized string Hello, my dog is cute, GPT2LMHeadModel predicts Hello, my dog is cute. I'm not sure if she's a puppy.

If I add four 0s to the left of the token IDs and use the attention mask [0,0,0,0,1,1,1,1,1,1,1], the prediction changes to !!!!Hello, my dog is cute hello hello hello hello hello hello hello hello hello hello.

I’d thought that these two inputs should lead me to the same result?

I have this notebook to reproduce the above experiment Google Colab

joaogante · February 21, 2023, 4:04pm

Hi @wangkuiyi In your examples, you forgot to consider the position_ids – regardless of the padding you insert, you want the position_ids that correspond to a given word to be invariant

Further resources:

Glossary: position IDs
How GPT2, the model itself, infers position IDs if you don’t pass them (spoiler: it’s based on your input_ids)
How generate attempts to handle it (spoiler: it uses the attention_mask as a guide)

wangkuiyi · February 21, 2023, 11:51pm

Thank you, Joao @joaogante !

Topic		Replies	Views
Is attention_mask in LanguageModels such as GPT2LMHeadModel related to attention mechanism is it just to specify padding tokens Beginners	2	207	June 27, 2024
LLaMA2 - tokenizer padding affecting logits (even with attention_mask) 🤗Transformers	8	4543	March 26, 2024
Pad Tokens & Attention Masks with Data Collators 🤗Transformers	0	57	August 29, 2024
The effect of padding_side 🤗Transformers	13	15070	May 27, 2025
Understanding attention output from generate method in GPT model Beginners	0	617	November 8, 2023

Padding to the left of the inputs, GPT2LMHeadModel gives different answer

Related topics