Response from retrieval chain is truncated

Hello fellow HFers,

any help would be appreciated. I am running open source LLM such as Bloom or Llama2-7b-hf or Flan-t5 and I am having same issue with each. The response from the chain retrieval is truncated after a certain number of characters. I tried searching online and looking at the API params but couldn’t find anything. Here is the relevant code snippet

def get_vector_store(text_chunks):

# For Huggingface Embeddings

embeddings = HuggingFaceInstructEmbeddings(model_name = "hkunlp/instructor-large")

vectorstore = FAISS.from_texts(texts = text_chunks, embedding = embeddings)


# For Huggingface Embeddings
llm = HuggingFaceHub(repo_id="bigscience/bloom", model_kwargs={"temperature":0.5, "max_length":512})

memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)

conversation_chain = ConversationalRetrievalChain.from_llm(
    llm = llm,
    retriever = vectorstore.as_retriever(),
    memory = memory

)

Any help would be appreciated

1 Like