Hello to everyone on the forums as this is my first post here
I have tried to find a topic or topics that would include the answer to my question, but as I didn‘t find any, I decided to create this topic.
I have a question regarding the context length, referring to its practicality in particular.
Now, you can call me a malcontent, but isn‘t the context length of 4096 tokens that you can set for LLaMA2 or Falcon (not to mention 2048 tokens) way too little for a longer conversation?
What I mean here is that I am using this model maddes8cht/ehartford-WizardLM-Uncensored-Falcon-40b-gguf · Hugging Face by Mathias Bachmann, known as @maddes8cht. I use LM Studio for macOS and if throughout a conversation with the model I pass shorter and longer portions of text, over time I quickly not only reach the context length limit of either 2048 or 4096 tokens (because 2048 seems to be the length accepted by standard, whereas 4096 is what I simply tried myself), but even exceed it. This has happened to me more than once, and is my assumption, based on observation, correct that as much as you can continue the conversation even if you exceed the context length limit, the AI chat does not remember the discussion that was before exceeding the context length limit? Also based on observation, I have a feeling that once the context length limit is reached first and then exceeded, whatever that is discussed after the exceeding is counted just as if you started from 0 tokens and onwards again (in other words, starting anew). Working this way, the AI chat doesn‘t seem to remember what was discussed earlier, and by itself it is unable to refer to the earlier discussion. If, for example, you ask it about something that was discussed or mentioned earlier, it says (one way or the other) that it has not noticed such a thing.
I have been trying to write to @maddes8cht to ask him what is the exact maximum context length for his model (because I was unable to find specific information on the matter pertaining to the Falcon architecture used in particular), but neither HuggingFace, GitHub nor Twitter (now X) supports sending private messages. Perhaps someone here knows what is the maximum context length for the Falcon architecture? I know that 2048 and 4096 both work fine, but when for the purpose of testing I doubled the latter and used 8192, the model began generating gibberish. Since 8192 looks to be too much already, trying out even greater values is pointless.
Am I right that the AI chat does not remember what was discussed earlier once you exceed the context length limit, and don‘t such limitations make the AI chats more of toys for fun rather than practical tools for work?