Context: I am trying to query Llama-2 7B, taken from HuggingFace (meta-llama/Llama-2-7b-hf). I give it a question and context (I would guess anywhere from 200-1000 tokens), and ask it to answer the question based on the context (context is retrieved from a vectorstore using similarity search). Here are my two problems:
- The answer ends, and the rest of the tokens until it reaches
max_new_tokensare all newlines. Or it just doesn’t generate any text and the entire response is newlines. Adding a
repetition_penaltyof 1.1 or greater has solved infinite newline generation, but does not get me full answers.
- For answers that do generate, they are copied word for word from the given context. This remains the same with
repetition_penalty=1.1, and making the repetition penalty too high makes the answer nonsense.
I have only tried using
temperature=0.8, but from what I have done, tuning temperature and
repetition_penalty both result in either the context being copied or a nonsensical answer.
Note about the “context”: I am using a document stored in a Chroma vector store, and similarity search retrieves the relevant information before I pass it to Llama.
My query is to summarize a certain Topic X.
query = "Summarize Topic X"
The retrieved context from the vectorstore has 3 sources that looks something like this (I format the sources in my query to the LLM separated by newlines):
context = """When talking about Topic X, Scenario Y is always referred to. This is due to the relation of
Topic X is a broad topic which covers many aspects of life.
No one knows when Topic X became a thing, its origin is unknown even to this day."""
Then the response from Llama-2 directly mirrors one piece of context, and includes no information from the others. Furthermore, it produces many newlines after the answer. If the answer is 100 tokens, and max_new_tokens is 150, I have 50 newlines.
response = "When talking about Topic X, Scenario Y is always referred to. This is due to the relation of \n\n\n\n"
One of my biggest issues is that in addition to copying one piece of context, if the context ends mid-sentence, so does the LLM response.
Is anyone else experiencing anything like this (newline issue or copying part of your input prompt)? Has anyone found a solution?