Need help with understanding of KV cache
- Is the KV cache fixed to a specific size (max_seq_length)? From generation utils it appears there is no limit to the size. IIUC
past_key_valuesare always are forwarding without limiting the size
- If yes, how are the older entries of KV cache replaced?