RAG LLM Generating the Prompt also at the response

Arpx22 · February 28, 2024, 6:08pm

I was trying to build a RAG LLM model using opensource models. but while generating the response the llm is attaching the entire prompt and relevant document at the output. can anyone please tell me how can I remove the prompt and the Question section and get only the Answer in response ?

Code:
from langchain_community.document_loaders import PyPDFLoader

loader = PyPDFLoader(“EM_Theory.pdf”)
pages = loader.load_and_split()

from langchain.text_splitter import RecursiveCharacterTextSplitter

text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=20)
text_chunks = text_splitter.split_documents(pages)

from langchain_community.embeddings import HuggingFaceEmbeddings

embeddings = HuggingFaceEmbeddings(model_name = ‘sentence-transformers/all-mpnet-base-v2’)
vector_store = FAISS.from_documents(text_chunks, embedding=embeddings)

from langchain.prompts import PromptTemplate
from langchain.chains import ConversationalRetrievalChain
from langchain.memory import ConversationBufferMemory
prompt_template = PromptTemplate(input_variables=[‘chat_history’, ‘question’],
template=‘’‘Given the following conversation and a follow up question,
rephrase the follow up question to be a standalone question,
in its original language. Only generate the answer of the asked question.
Don’t generate the contexts and questions in output
\n\nChat History:\n{chat_history}\nFollow Up Input: {question}’‘’)
memory = ConversationBufferMemory(memory_key=“chat_history”, return_messages=True)

chain = ConversationalRetrievalChain.from_llm(llm=llm, chain_type=‘stuff’,condense_question_prompt = prompt_template,
retriever=vector_store.as_retriever(search_kwargs={“k”: 2}),
memory=memory)
query = ‘what is the Maxwell’s equation?’
history =
result = chain({“question”: query, “chat_history”: history})
history.append((query, result[“answer”]))

print(result)

Output:
{‘question’: ‘what is the Maxwell’s equation?’,
‘chat_history’: [HumanMessage(content=‘what is the Maxwell’s equation?’),
AIMessage(content=“Use the following pieces of context to answer the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.\n\nLet’s play physics 9681634157 \n10 \n \n \n \n \nWAVE EQUATION IN FREE SPACE \n Write down Maxwell’s equation in free space. Obtain the wave equation for electric \nfield intensity from them. CU 1010, 09 ,06, 01 \n OR \nShow that Maxwell’s equations suggest propagation of electromagnetic wave in a linear \nhomogeneous dielectric medium having no free charge. CU 2014 \n OR \n \nDerive the expression of speed of light from Maxwell’s equations. CU 2015 \n \n Show that for a plane em wave in free space, the unit vector in the direction of \npropagation the electric and magnetic fields are mutually perpendi cular. 4 \n OR CU2011 ,06 , 05\n\nLet’s play physics 9681634157 \n16 \n 𝛻⃗ ×𝐻⃗⃗ =𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡 \n ∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=𝐻⃗⃗ ∙(𝛻⃗ ×𝐸⃗ )−𝐸⃗ ∙(𝛻⃗ ×𝐻⃗⃗ ) \n \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙(𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡) \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙𝐽 −𝐸⃗ ∙𝜕𝐷⃗⃗ \n𝜕𝑡 \n For a linear medium 𝐷⃗⃗ =∈𝐸⃗ & 𝐵⃗ =𝜇𝐻⃗⃗ \n∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=−1\n2𝜕\n𝜕𝑡(𝐻⃗⃗ ∙𝐵⃗ )−1\n2𝜕\n𝜕𝑡(𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽 \n=−𝜕\n𝜕𝑡(1\n2𝐻⃗⃗ ∙𝐵⃗ +1\n2𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽 \nIntegrating above equations over a volume 𝑉 bounded by closed surface 𝑆 and \napplying divergence theorem, \n∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 =−𝑑\n𝑑𝑡∫1\n2(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉−∫𝐸⃗ ∙𝐽 \n𝑣 \n𝑣 𝑑𝑉 \n \n𝑂𝑟,∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 +∫𝐸⃗ ∙𝐽 \n𝑣 𝑑𝑉=−𝑑\n𝑑𝑡∫1\n2 \n𝑣(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉 \n It is the mathematical form of Poynting’s theorem. \nLet us now find a physical meaning of this equation. \na. The rate of work done by E.M. force on an element charge 𝑑𝑞 (=𝜌 𝑑𝑉) is given \nby,\n\nQuestion: what is the Maxwell’s equation?\nHelpful Answer: I do not have enough information about Maxell’s equation therefore I cannot provide an answer.”)],
‘answer’: “Use the following pieces of context to answer the question at the end. If you don’t know the answer, just say that you don’t know, don’t try to make up an answer.\n\nLet’s play physics 9681634157 \n10 \n \n \n \n \nWAVE EQUATION IN FREE SPACE \n Write down Maxwell’s equation in free space. Obtain the wave equation for electric \nfield intensity from them. CU 1010, 09 ,06, 01 \n OR \nShow that Maxwell’s equations suggest propagation of electromagnetic wave in a linear \nhomogeneous dielectric medium having no free charge. CU 2014 \n OR \n \nDerive the expression of speed of light from Maxwell’s equations. CU 2015 \n \n Show that for a plane em wave in free space, the unit vector in the direction of \npropagation the electric and magnetic fields are mutually perpendi cular. 4 \n OR CU2011 ,06 , 05\n\nLet’s play physics 9681634157 \n16 \n 𝛻⃗ ×𝐻⃗⃗ =𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡 \n ∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=𝐻⃗⃗ ∙(𝛻⃗ ×𝐸⃗ )−𝐸⃗ ∙(𝛻⃗ ×𝐻⃗⃗ ) \n \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙(𝐽 +𝜕𝐷⃗⃗ \n𝜕𝑡) \n=−𝐻⃗⃗ ∙𝜕𝐵⃗ \n𝜕𝑡−𝐸⃗ ∙𝐽 −𝐸⃗ ∙𝜕𝐷⃗⃗ \n𝜕𝑡 \n For a linear medium 𝐷⃗⃗ =∈𝐸⃗ & 𝐵⃗ =𝜇𝐻⃗⃗ \n∴ 𝛻⃗ ∙(𝐸⃗ ×𝐻⃗⃗ )=−1\n2𝜕\n𝜕𝑡(𝐻⃗⃗ ∙𝐵⃗ )−1\n2𝜕\n𝜕𝑡(𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽 \n=−𝜕\n𝜕𝑡(1\n2𝐻⃗⃗ ∙𝐵⃗ +1\n2𝐸⃗ ∙𝐷⃗⃗ )−𝐸⃗ ∙𝐽 \nIntegrating above equations over a volume 𝑉 bounded by closed surface 𝑆 and \napplying divergence theorem, \n∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 =−𝑑\n𝑑𝑡∫1\n2(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉−∫𝐸⃗ ∙𝐽 \n𝑣 \n𝑣 𝑑𝑉 \n \n𝑂𝑟,∮(𝐸⃗ ×𝐻⃗⃗ ) \n𝑆∙𝑑𝑆 +∫𝐸⃗ ∙𝐽 \n𝑣 𝑑𝑉=−𝑑\n𝑑𝑡∫1\n2 \n𝑣(𝐸⃗ ∙𝐷⃗⃗ +𝐵⃗ ∙𝐻⃗⃗ )𝑑𝑉 \n It is the mathematical form of Poynting’s theorem. \nLet us now find a physical meaning of this equation. \na. The rate of work done by E.M. force on an element charge 𝑑𝑞 (=𝜌 𝑑𝑉) is given \nby,\n\nQuestion: what is the Maxwell’s equation?\nHelpful Answer: I do not have enough information about Maxell’s equation therefore I cannot provide an answer.”}

I have also tried with mistralai/Mistral-7B-Instruct-v0.2 , NousResearch/Nous-Hermes-2-Mixtral-8x7B-DPO and mistralai/Mixtral-8x7B-Instruct-v0.1 . but got same kind of result.
langchain Version: 0.1.9

Can anyone solve this issue ?

caricaa · March 1, 2024, 8:43pm

I have the same problem, have you found a solution?
but when I try to use the text2text generation model, the prompt doesn’t appear in the response

Arpx22 · March 2, 2024, 5:46am

No. I have not found any solution yet. so I am using regex to solve this issue

result['answer'] = re.split('Answer:',result['answer'])[-1]

aparnakesarkar · March 6, 2024, 9:03pm

Facing the same issue with llama2

CKeibel · March 7, 2024, 7:04am

I found the parameter return_only_outputs in the langchain documentation for ConversationalRetrievalChain, maybe that will help. However, it is marked as deprecated.

Usually this problem is decoder related. During the generation of the response, the new tokens are always appended to the input sequence and re-input into the model to generate a new token until the eos token is generated. In huggingface transformers something like this can be done to decode only the new tokens:

tokenized_prompt = tokenizer(prompt, return_tensors="pt").to("cuda")
# generate new tokens
outputs = model.generate(**tokenized_prompt)
# decode only new tokens to string
tokenizer.decode(outputs[0][len(tokenized_prompt.input_ids[0]):])

Since we know the tokenized input length of the prompt (len(tokenized_prompt.input_ids[0])), we can give the sequence to the decoder and only decode from the end of the input sequence.

Perhaps something similar to huggingface transformers would be possible instead of split:

answer = result['answer'][len(query):]

However, you need to make sure that you get the string text of your prompt. I think Langchain always wraps everything in its own classes, that’s why I don’t like working with langchain, you give up some control.

KJR · April 17, 2024, 8:15pm

I had this problem in the last days.
The only solution that I had was to downgrade LangChain to the version 0.1.6. Then it works fine again.

KushwanthK · May 19, 2024, 1:02am

I also got same problem with llama3

vpkprasanna · May 19, 2024, 4:24pm

I would highly recommend to follow this link to understanding the prompt format for llama2

aadilgani123 · September 25, 2024, 11:10am

If you’re using transformer pipeline use: ```
return_full_text=False

Example:

pipe = transformers.pipeline(
      "text-generation",
      model=model,
      tokenizer= tokenizer,
      device_map="auto",
      max_new_tokens = 512,
      do_sample=True,
      return_full_text=False,
      top_k=10,
      num_return_sequences=1,
      eos_token_id=tokenizer.eos_token_id
)


from langchain.llms import HuggingFacePipeline
llm = HuggingFacePipeline(pipeline=pipe, model_kwargs={'temperature':0.1})

Topic		Replies	Views
LLMs Return Prompt as Well as Generated Text Beginners	2	1459	June 20, 2024
Excluding Prompt from Language Model's Response Beginners	1	978	July 3, 2024
RetrievalQA output repeats prompt and context sources Models	0	82	July 26, 2024
Getting Additional response from my RAG using HuggingFaceEndpoint inference Beginners	3	41	March 16, 2025
Llama-2-70b-chat-hf Model is adding irrelevant topics to output Models	0	679	October 20, 2023

RAG LLM Generating the Prompt also at the response

Related topics