Speculative Decoding: How to verify multiple tokens in a single forward pass?

onehundered200 · January 4, 2024, 8:13am

In the blog post on speculative decoding, the main idea is that we can use a smaller model to generate tokens, and “verify” these generated tokens in one go using the larger model

However, I dont quite follow how we can verify multiple tokens in a single forward pass of the larger model.

For example, assuming tokens are at word boundaries, and the smaller model generates the following text in 5 forward passes

the quick brown sock jumps

since we do not know which token / word is incorrect (actually 4th word), wont we need to check each token one by one using the larger model? That will require atleast 4 verifications:

the → quick [pass]
the quick → brown [pass]
the quick brown → sock [fail]
the quick brown fox → jumps [pass]

The blog (and the underlying paper) seems to claim that all these verifications can be done in a single forward pass of the larger model. How is that feasible?

Topic		Replies	Views
Is it possible to generate more than one token when using a decoder only model via forward pass? 🤗Transformers	1	605	May 23, 2024
Novel, provably optimal, lossy variant of speculative decoding Show and Tell	2	966	January 11, 2024
T5 forward pass versus generate, latter outputs non-sense Beginners	8	2899	March 25, 2021
Speculative Decoding with Qwen Models 🤗Transformers	1	334	March 5, 2025
Infilling multiple mask spans with BartForConditionalGeneration Intermediate	0	409	July 12, 2022

Speculative Decoding: How to verify multiple tokens in a single forward pass?

Related topics