Llama2 pad token for batched inference

I find the following working very well:

tokenizer.pad_token = "[PAD]"
tokenizer.padding_side = "left"

I used to use what you had, but I found that doing batch inference with that inference gives different results compared to sequential inference, which is not supposed to happen.

7 Likes