@lqtrung what you described as option 1. (right padding during training, left padding during inference) is the way to go.
You can also always pass position_ids
, but the settings above get you the correct results without passing them. A caveat here is that you never want GPT2 to generate after its pad token (note: GPT2 doesn’t have a pad token, but it is common to set pad token = eos token), even if you pass the correct position_ids
. GPT2 was not trained for that case, and the results will be gibberish – right padding will often get you in this situation.
A good resource to reason about this is the illustrated GPT2