Batch generation with GPT2

Hi @joaogante , thank you for the response.

I believe that the position_ids is properly prepared during generation as you said because the prepare_inputs_for_generation is called …

But my question is about during training where that function is not called and the gpt2 modeling script does not compute position_ids based on the attention mask (so it is not correct when ‘left’ padding is used …)

So I’m not sure about the recommended practice:

  1. Is ‘right’ padding always used during training … and ‘left’ padding is only used during batch generation ?
  2. Or the training and generation should have the same padding scheme and in this case the gpt2 modeling script should handle the position_ids better ?