Padding to the left of the inputs, GPT2LMHeadModel gives different answer

When I input the tokenized string Hello, my dog is cute, GPT2LMHeadModel predicts Hello, my dog is cute. I'm not sure if she's a puppy.

If I add four 0s to the left of the token IDs and use the attention mask [0,0,0,0,1,1,1,1,1,1,1], the prediction changes to !!!!Hello, my dog is cute hello hello hello hello hello hello hello hello hello hello.

I’d thought that these two inputs should lead me to the same result?

I have this notebook to reproduce the above experiment Google Colab

Hi @wangkuiyi :wave: In your examples, you forgot to consider the position_ids – regardless of the padding you insert, you want the position_ids that correspond to a given word to be invariant :slight_smile:

Further resources:


Thank you, Joao @joaogante !