Constrained Beam Search - Very Slow

agneet · May 15, 2023, 10:48pm

Hello,
I am trying out generation in the following manner –

with torch.no_grad():
            generated_tokens = model.generate(
            desc_tokens.to('cuda'),
            max_new_tokens=256,
            min_new_tokens=256,
            force_words_ids = constraint_tokens,
            num_beams=2
        )

This is unexpectedly taking a lot of time. Is there a known issue?

Linke · June 30, 2024, 1:00pm

In my case, the generation speed is highly effected by the number of constraint words, i.e., len(constraint_tokens ). You can try to remove the constraints or reduce the size of it and see if the speed is improved to check this factor.
As for the reason, you can refer to this great post about constrained beam search. In brief, large number of constraints means much larger beam size, and thus causes larger delay.

Topic		Replies	Views
Model.generate() is extremely slow while using beam search 🤗Transformers	2	5386	July 24, 2022
Beam_search and generate are not consistent 🤗Transformers	0	497	May 10, 2022
Speed up beam search for item generation DeepSpeed	1	942	October 4, 2023
Baffling performance issue on most NVidia GPUs with simple transformers + pytorch code Intermediate	5	4504	April 9, 2024
How to parallelize model.generate? 🤗Transformers	1	793	September 7, 2022

Constrained Beam Search - Very Slow

Related topics