A friend found the answer. The issue is that attention_mask
is not the second argument of generate
.
So the last two lines should be
output_tokens = model.generate(input_ids=input_ids, attention_mask=attention_mask)
output_texts = tokenizer.batch_decode(output_tokens)