Using huggingface transformers library, I see different outputs when generating text with model.generate with and without use_cache argument. Is this intended and how can I combat this? The scores when I use cache (from the second token generated and onwards) are different. AFAIK use_cache is an op…

Model.generate use_cache=True generates different results than use_cache=False

John6666 March 4, 2025, 1:24pm 2

Similar case?

Topic		Replies	Views
Outputs change if re-using KVCache (past_key_values) for model.forward and generation 🤗Transformers	5	242	January 22, 2025
Transformer KV-Cache Produces Worse Output Than Normal Generation – Why? Beginners	1	219	March 3, 2025
What is the purpose of 'use_cache' in decoder? 🤗Transformers	5	23808	July 4, 2023
What does the `use_cache` in `generate` actually do? 🤗Transformers	1	2394	May 9, 2024
Using gradient checkpointing and KV caching when generation happens in no grad context 🤗Transformers	2	295	September 28, 2024