Hi I am the author of the PR.
You can now do batch generation by calling the same generate().
All you need to add is:
- set
tokenizer.padding_side = "left"(probably reset it back later) - pass in
attention_masktogenerate()
Explanation: (see full example in the end)
- We need
tokenizer.padding_side = "left"because we will use the logits of the right-most token to predict the next token, so the padding should be on the left. - This what this PR added. Here is a summary:
GPT-2 uses absolute positional embedding (position_ids), before this change, no position_ids is passed in to the model, and the model automatically generates the embeddings from 0 to n, even if there is padding (e.g. when input is a batch).
Example: tokens=<pad> <pad> a b c -> position_ids=0 1 2 3 4, what we expect is x x 0 1 2 (x means don’t case)
This PR adds positional embedding in prepare_inputs_for_generation(), which is called in generate(), by calculating them using
attention_mask, and that’s why you need to pass it in.
You can find a full example in PR.