KV cache sizing

KV cache sizing

Need help with understanding of KV cache

  1. Is the KV cache fixed to a specific size (max_seq_length)? From generation utils it appears there is no limit to the size. IIUC past_key_values are always are forwarding without limiting the size
  2. If yes, how are the older entries of KV cache replaced?

Thanks