Hi,
I tried to build the QA RAG chatbot using Llama3 but it failed to give the response with the below error message. It fails at “RetrievalQA” method.
The same code works fine with the model “Mistral-7B-Instruct-v0.2”.
Any Help here is greatly appreciated.
Code:
llm = HuggingFaceEndpoint(
repo_id=llm_model,
huggingfacehub_api_token=HF_API_TOKEN,
temperature=temperature,
max_new_tokens=max_tokens,
top_k=top_k,
)
retriever = vector_db.as_retriever(search_kwargs={"k": 3})
compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")
compression_retriever = ContextualCompressionRetriever(
base_compressor=compressor, base_retriever=retriever
)
qachain = RetrievalQA.from_chain_type(
llm=llm,
chain_type="stuff",
retriever=compression_retriever,
return_source_documents=True,
chain_type_kwargs={"prompt": prompt, "verbose": False},
)
return qachain
Error Message:
HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct Input validation error: inputs
tokens + max_new_tokens
must be <= 8192. Given: 14087 inputs
tokens and 256 max_new_tokens