How to do batch generation with the GPT2 model?
Batch generation is now possible for GPT2 in master by leveraging the functionality shown in this PR: https://github.com/huggingface/transformers/pull/7552?notification_referrer_id=MDE4Ok5vdGlmaWNhdGlvblRocmVhZDEyMTMzNzA0MDA6MjM0MjM2MTk%3D#event-3876130796 .
For more info on how to prepare a GPT2 for batch generation, you can checkout this test:
Hi I am the author of the PR.
You can now do batch generation by calling the same
All you need to add is:
tokenizer.padding_side = "left"(probably reset it back later)
- pass in
Explanation: (see full example in the end)
- We need
tokenizer.padding_side = "left"because we will use the logits of the right-most token to predict the next token, so the padding should be on the left.
- This what this PR added. Here is a summary:
GPT-2 uses absolute positional embedding (
position_ids), before this change, no
position_ids is passed in to the model, and the model automatically generates the embeddings from 0 to n, even if there is padding (e.g. when input is a batch).
<pad> <pad> a b c -> position_ids=
0 1 2 3 4, what we expect is
x x 0 1 2 (
x means don’t case)
This PR adds positional embedding in
prepare_inputs_for_generation(), which is called in
generate(), by calculating them using
attention_mask, and that’s why you need to pass it in.
You can find a full example in PR.
Hi, there. Thanks for your work to support batch inference in GPT2. However, I still have one confusion, which may need your help. Thanks in advance!
If I wanna pass the “past_key_values”, how should I process the position_ids and attention mask? Supposing the length of my past_key_values is 2, the padded input is just like your example: , , a, b, c. Should I change the attention mask from 0, 0, 1, 1, 1 to 1, 1, 0, 0, 1, 1, 1, where the first double “1” refers. to the past_key_values.
Thanks a lot!
@patrickvonplaten @ttj I think this is a good question! Could we discuss on how to do batch inference with
Is it possible to have variable
max_gen_length? depending on the length of the input sequence, for instance? (e.g.
max_gen_length = len(tokenizer.tokenize(input_seq) + 20)?
It looks like you are looking for
hi, I’m using the input parameter “past_key_values” to train a gpt model. So I wonder when doing batch generation in this way, if I pass “past_key_values” to model through the parameter “model_kwargs”, whether the generation method will work as expected?
Correct me if I’m wrong but for GPT2, during training mode, when
padding_side='left', position_ids should be passed correctly or else the position_ids is created as if right padding is used … making it inconsistent, right ?
I’m referring to this line where the position_ids is created if it is not passed …
transformers/modeling_gpt2.py at main · huggingface/transformers (github.com)
Thank you !
Hey @lqtrung The
position_ids don’t need to be passed, as long as the right
prepare_inputs_for_generation (see here) takes care of that for you
Hi @joaogante , thank you for the response.
I believe that the position_ids is properly prepared during generation as you said because the prepare_inputs_for_generation is called …
But my question is about during training where that function is not called and the gpt2 modeling script does not compute position_ids based on the attention mask (so it is not correct when ‘left’ padding is used …)
So I’m not sure about the recommended practice:
- Is ‘right’ padding always used during training … and ‘left’ padding is only used during batch generation ?
- Or the training and generation should have the same padding scheme and in this case the gpt2 modeling script should handle the position_ids better ?
@lqtrung what you described as option 1. (right padding during training, left padding during inference) is the way to go.
You can also always pass
position_ids, but the settings above get you the correct results without passing them. A caveat here is that you never want GPT2 to generate after its pad token (note: GPT2 doesn’t have a pad token, but it is common to set pad token = eos token), even if you pass the correct
position_ids. GPT2 was not trained for that case, and the results will be gibberish – right padding will often get you in this situation.
A good resource to reason about this is the illustrated GPT2