Response from retrieval chain is truncated

santa1666 · November 11, 2023, 8:31pm

Hello fellow HFers,

any help would be appreciated. I am running open source LLM such as Bloom or Llama2-7b-hf or Flan-t5 and I am having same issue with each. The response from the chain retrieval is truncated after a certain number of characters. I tried searching online and looking at the API params but couldn’t find anything. Here is the relevant code snippet

def get_vector_store(text_chunks):

# For Huggingface Embeddings

embeddings = HuggingFaceInstructEmbeddings(model_name = "hkunlp/instructor-large")

vectorstore = FAISS.from_texts(texts = text_chunks, embedding = embeddings)


# For Huggingface Embeddings
llm = HuggingFaceHub(repo_id="bigscience/bloom", model_kwargs={"temperature":0.5, "max_length":512})

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

conversation_chain = ConversationalRetrievalChain.from_llm(
    llm = llm,
    retriever = vectorstore.as_retriever(),
    memory = memory

)

Any help would be appreciated

Topic		Replies	Views
Getting repetative response using ConversationalRetrievalChain + HugginFaace Beginners	0	166	April 29, 2024
Hugging face truncated output via langchain 🤗Transformers	0	104	June 25, 2024
Text Generation response truncation Beginners	6	1348	August 18, 2024
Truncated un-finished response after deploying hugging-face models Amazon SageMaker	0	377	January 19, 2024
Getting Additional response from my RAG using HuggingFaceEndpoint inference Beginners	3	41	March 16, 2025

Response from retrieval chain is truncated

Related topics