Constrained Beam Search - Very Slow

Hello,
I am trying out generation in the following manner –

with torch.no_grad():
            generated_tokens = model.generate(
            desc_tokens.to('cuda'),
            max_new_tokens=256,
            min_new_tokens=256,
            force_words_ids = constraint_tokens,
            num_beams=2
        )

This is unexpectedly taking a lot of time. Is there a known issue?

In my case, the generation speed is highly effected by the number of constraint words, i.e., len(constraint_tokens ). You can try to remove the constraints or reduce the size of it and see if the speed is improved to check this factor.
As for the reason, you can refer to this great post about constrained beam search. In brief, large number of constraints means much larger beam size, and thus causes larger delay.