Novel, provably optimal, lossy variant of speculative decoding


Speculative decoding is a hot topic right now!

Here is a blog post I wrote proposing mentored decoding, a new variant of speculative decoding. It maximizes the probability to accept the draft tokens while maintaining the Kullback-Leibler divergence between the resulting distribution and the target distribution below a constant D. Speculative decoding can be seen as a special case of mentored decoding for D = 0.

Feedback welcome!

Thanks to @joaogante who got me interested in speculative decoding with his blog post.