Batch generation with GPT2

@lqtrung what you described as option 1. (right padding during training, left padding during inference) is the way to go.

You can also always pass position_ids, but the settings above get you the correct results without passing them. A caveat here is that you never want GPT2 to generate after its pad token (note: GPT2 doesn’t have a pad token, but it is common to set pad token = eos token), even if you pass the correct position_ids. GPT2 was not trained for that case, and the results will be gibberish – right padding will often get you in this situation.

A good resource to reason about this is the illustrated GPT2 :slight_smile: