Outputs change if re-using KVCache (past_key_values) for model.forward and generation

HarlynDN · January 22, 2025, 11:28am

In short: Using KV cache will change the logits, especially when the model is loaded in 16-bit precision.

Topic		Replies	Views
Storing and loading KV cache 🤗Transformers	6	1456	October 21, 2024
Does model supports partial `past_key_values`? 🤗Transformers	0	435	May 12, 2023
Why if use cache in gpt2 model from transformers , the logits are different if i do a forward pass from scratch Models	1	355	February 25, 2024
The way to get Seq2SeqLM's `decoder_input_ids` to obtain `past_key_values` Beginners	0	1351	October 25, 2020
Is There a Way to Improve Memory Usage When Using Identical `past_key_values` for All Samples in a Batch? 🤗Transformers	3	388	October 21, 2024