Time complexity of the generate method in transformer library (using beam search)

sachjbp · December 9, 2021, 4:45pm

Hi,

I was curious to know how the number of forward passes scales with the number of beams in generate method in transformers. My idea is that for a greedy generation itself we would require at least max_seq_length number of forward passes assuming we are predicting one token at a time. So, to predict the first word we do one forward pass, and to predict the second word we make k forward passes, and selecting k top tokens from each forward passes gives me k^2 options and it does k^2 forward passes to predict the 3rd word. So, the total number of forward passes becomes a sum of a geometric series i.e., 1+k+k^2 + k^3...n terms, and hence O(k^n). Could anyone show some light on this or tell if my understanding is correct?

Topic		Replies	Views
Model.generate() is extremely slow while using beam search 🤗Transformers	2	5453	July 24, 2022
GPT-2 Logits to tokens for beam search (Generate method) 🤗Transformers	0	1319	September 2, 2021
Can trainer.predict() return multiple generations for each sample? 🤗Transformers	2	769	January 18, 2022
Big `generate()` refactor 🤗Transformers	7	3776	November 26, 2021
Why does num_return_sequences > num_beams mean? Beginners	0	2561	February 13, 2022

Time complexity of the generate method in transformer library (using beam search)

Related topics