This is an offshoot from Why past_key_values is not in GreedySearchDecoderOnlyOutput? -
Transformers - Hugging Face Forums
What does the use_cache in generate actually do? Since there is no way to provide the past_key_values to the generate function, does that mean that generate internally stores past_key_values between runs (say I run generate in a loop for a chatbot)?