KV cache sizing
Need help with understanding of KV cache
- Is the KV cache fixed to a specific size (max_seq_length)? From generation utils it appears there is no limit to the size. IIUC
past_key_values
are always are forwarding without limiting the size - If yes, how are the older entries of KV cache replaced?
Thanks