Hello fellow HFers,
any help would be appreciated. I am running open source LLM such as Bloom or Llama2-7b-hf or Flan-t5 and I am having same issue with each. The response from the chain retrieval is truncated after a certain number of characters. I tried searching online and looking at the API params but couldn’t find anything. Here is the relevant code snippet
def get_vector_store(text_chunks):
# For Huggingface Embeddings
embeddings = HuggingFaceInstructEmbeddings(model_name = "hkunlp/instructor-large")
vectorstore = FAISS.from_texts(texts = text_chunks, embedding = embeddings)
# For Huggingface Embeddings
llm = HuggingFaceHub(repo_id="bigscience/bloom", model_kwargs={"temperature":0.5, "max_length":512})
memory = ConversationBufferMemory(memory_key='chat_history', return_messages=True)
conversation_chain = ConversationalRetrievalChain.from_llm(
llm = llm,
retriever = vectorstore.as_retriever(),
memory = memory
)
Any help would be appreciated