Speculative Decoding: How to verify multiple tokens in a single forward pass?

In the blog post on speculative decoding, the main idea is that we can use a smaller model to generate tokens, and “verify” these generated tokens in one go using the larger model

However, I dont quite follow how we can verify multiple tokens in a single forward pass of the larger model.

For example, assuming tokens are at word boundaries, and the smaller model generates the following text in 5 forward passes

the quick brown sock jumps

since we do not know which token / word is incorrect (actually 4th word), wont we need to check each token one by one using the larger model? That will require atleast 4 verifications:

  1. the → quick [pass]
  2. the quick → brown [pass]
  3. the quick brown → sock [fail]
  4. the quick brown fox → jumps [pass]

The blog (and the underlying paper) seems to claim that all these verifications can be done in a single forward pass of the larger model. How is that feasible?

2 Likes