I am specifically referring to the ConversationalRetrievalChain chain.
Basically this chain keeps the chat history by combining the prompt with previous answers (chat history) to the GPT model to ask next question. That way the model some context and behaves like it remembers its previous conversation.
My question is, as the conversation history gets larger and larger, at some point, when passing the new question with the history to the model, it will exceed the token limit. Does the chain has some built-in capability starting to delete the earliest conversation so you will never get errors as you continue the chats?
Alternatively we just need to build some extra code to check count the tokens and if exceed the limit, use pop(0) function to start deleting the earliest conversation.
I couldn’t find such function within the chain. I did find a property called max_tokens_limit. However it is to limit the size of revelant data chunks from the vector stores before passing to the model.
*field* max_tokens_limit*: Optional[int]* *= None* If set, restricts the docs to return from store based on tokens, enforced only for StuffDocumentChain
That is not what I am after. The most efficient way to achieve this will be counting the tokens of the chat history before user asks the next question.