Batch generation with GPT2

joaogante · February 16, 2023, 10:54am

@lqtrung what you described as option 1. (right padding during training, left padding during inference) is the way to go.

You can also always pass position_ids, but the settings above get you the correct results without passing them. A caveat here is that you never want GPT2 to generate after its pad token (note: GPT2 doesn’t have a pad token, but it is common to set pad token = eos token), even if you pass the correct position_ids. GPT2 was not trained for that case, and the results will be gibberish – right padding will often get you in this situation.

A good resource to reason about this is the illustrated GPT2

Topic		Replies	Views
What is the correct format of input when fine-tuning GPT2 for text generation with batch input? Models	0	509	January 22, 2024
Different model.generate() predictions between batched and unbatched/padded token inputs 🤗Transformers	2	2265	August 26, 2023
The effect of padding_side 🤗Transformers	14	15807	August 7, 2025
Variable length batch decoding 🤗Transformers	11	3957	March 31, 2024
Padding to the left of the inputs, GPT2LMHeadModel gives different answer Intermediate	2	1292	February 21, 2023

Batch generation with GPT2

Related topics