Batch generation with GPT2

marsggbo · January 16, 2024, 2:28am

The code is designed to pad all tokens to the same length, equivalent to the maximum length of one input. However, it’s important to note that during inference, the output lengths of different inputs can vary.

After testing the code with nine sentences, where the input lengths range from 32 to 1140, it was observed that model.generate(inputs_padded) completed after only one forward pass (decoding). This suggests that the model didn’t perform decoding correctly.

An additional attempt was made using model.generate(inputs_padded, max_new_tokens=64). However, this resulted in an error related to CUDA. This is probably because some sentences has completed generation while others have not.

Any suggestions to solve these problems?

Topic		Replies	Views
What is the correct format of input when fine-tuning GPT2 for text generation with batch input? Models	0	511	January 22, 2024
Different model.generate() predictions between batched and unbatched/padded token inputs 🤗Transformers	2	2282	August 26, 2023
The effect of padding_side 🤗Transformers	14	16168	August 7, 2025
Variable length batch decoding 🤗Transformers	11	3968	March 31, 2024
Padding to the left of the inputs, GPT2LMHeadModel gives different answer Intermediate	2	1298	February 21, 2023

Batch generation with GPT2

Related topics