Batch generation with GPT2

lqtrung · February 16, 2023, 3:03am

Hi @joaogante , thank you for the response.

I believe that the position_ids is properly prepared during generation as you said because the prepare_inputs_for_generation is called …

But my question is about during training where that function is not called and the gpt2 modeling script does not compute position_ids based on the attention mask (so it is not correct when ‘left’ padding is used …)

So I’m not sure about the recommended practice:

Is ‘right’ padding always used during training … and ‘left’ padding is only used during batch generation ?
Or the training and generation should have the same padding scheme and in this case the gpt2 modeling script should handle the position_ids better ?

Topic		Replies	Views
What is the correct format of input when fine-tuning GPT2 for text generation with batch input? Models	0	509	January 22, 2024
Different model.generate() predictions between batched and unbatched/padded token inputs 🤗Transformers	2	2265	August 26, 2023
The effect of padding_side 🤗Transformers	14	15806	August 7, 2025
Variable length batch decoding 🤗Transformers	11	3957	March 31, 2024
Padding to the left of the inputs, GPT2LMHeadModel gives different answer Intermediate	2	1292	February 21, 2023

Batch generation with GPT2

Related topics