Outputs change if re-using KVCache (past_key_values) for model.forward and generation

I have found a post that could explain this: Possible Bug with KV Caching in Llama (original) model 路 Issue #25420 路 huggingface/transformers 路 GitHub

In short: Using KV cache will change the logits, especially when the model is loaded in 16-bit precision.

1 Like