Meta-Llama-3-8B-Instruct: Validation Error "Max_new_tokens"

Hi,
I tried to build the QA RAG chatbot using Llama3 but it failed to give the response with the below error message. It fails at “RetrievalQA” method.

The same code works fine with the model “Mistral-7B-Instruct-v0.2”.

Any Help here is greatly appreciated.

Code:
llm = HuggingFaceEndpoint(
repo_id=llm_model,
huggingfacehub_api_token=HF_API_TOKEN,
temperature=temperature,
max_new_tokens=max_tokens,
top_k=top_k,
)

retriever = vector_db.as_retriever(search_kwargs={"k": 3})

compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")

compression_retriever = ContextualCompressionRetriever(
                        base_compressor=compressor, base_retriever=retriever
                    )

qachain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=compression_retriever,
                return_source_documents=True,
                chain_type_kwargs={"prompt": prompt, "verbose": False},
            )

return qachain

Error Message:
HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 14087 inputs tokens and 256 max_new_tokens

Hi, im experience the same error. Did you find any solution yet ?