What is the purpose of 'use_cache' in decoder?

yjernite · September 2, 2020, 2:25pm

The cache is only used for generation, not for training.

Say you have M input tokens and want to generate N out put tokens.

Without cache, the model computes the M hidden states for the input, then generates a first output token. Then, it computes the hidden state for the first generated token, and generates a second one. Then, it computes the hidden state for the first two generated tokens to generate the third one, and so on an so forth.

However, since the output side is auto-regressive, an output token hidden state remains the same once computed for every further generation step, so recomputing it every time we want to generate a new token seems wasteful.

With the cache, the model saves the hidden state once it has been computed, and only computes the one for the most recently generated output token at each time step, re-using the saved ones for hidden tokens. This reduces the generation complexity from O(n^3) to O(n^2) for a transformer model.

Hope that helps, let me know if you have any further questions!

Topic		Replies	Views
Why does PretrainedConfig.use_cache default to True? 🤗Transformers	0	498	November 11, 2020
Model.generate use_cache=True generates different results than use_cache=False Intermediate	3	186	March 4, 2025
A question about the modeling_bart.py Models	1	324	November 12, 2020
Using generate() method with decoder Models	0	566	January 16, 2022
Using the decoder half of BART for causal generation Models	4	2779	May 2, 2022

What is the purpose of 'use_cache' in decoder?

Related topics