Hi @joaogante , thank you for the response.
I believe that the position_ids is properly prepared during generation as you said because the prepare_inputs_for_generation is called …
But my question is about during training where that function is not called and the gpt2 modeling script does not compute position_ids based on the attention mask (so it is not correct when ‘left’ padding is used …)
So I’m not sure about the recommended practice:
- Is ‘right’ padding always used during training … and ‘left’ padding is only used during batch generation ?
- Or the training and generation should have the same padding scheme and in this case the gpt2 modeling script should handle the position_ids better ?