Meta-Llama-3-8B-Instruct: Validation Error "Max_new_tokens"

Hi,
I tried to build the QA RAG chatbot using Llama3 but it failed to give the response with the below error message. It fails at “RetrievalQA” method.

The same code works fine with the model “Mistral-7B-Instruct-v0.2”.

Any Help here is greatly appreciated.

Code:
llm = HuggingFaceEndpoint(
repo_id=llm_model,
huggingfacehub_api_token=HF_API_TOKEN,
temperature=temperature,
max_new_tokens=max_tokens,
top_k=top_k,
)

retriever = vector_db.as_retriever(search_kwargs={"k": 3})

compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")

compression_retriever = ContextualCompressionRetriever(
                        base_compressor=compressor, base_retriever=retriever
                    )

qachain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=compression_retriever,
                return_source_documents=True,
                chain_type_kwargs={"prompt": prompt, "verbose": False},
            )

return qachain

Error Message:
HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 14087 inputs tokens and 256 max_new_tokens

Hi, im experience the same error. Did you find any solution yet ?

Same issue for me here, are there any workarounds?

Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 14087 inputs tokens and 256 max_new_tokens

If this error message is correct, then the token you are giving is too long. Why don’t you try giving shorter words?
It says max_new_tokens + tokens must be less than 8192 characters combined.
But it says that actually given was over 12,000 characters in tokens alone…

2 Likes

In my case, I am getting the error regardless of the token size.
If I work in a .ipynb file, I am able to correctly get outputs from the model. Yet, when I plug the model in streamlit to have a nicer visual interface, I get the error regardless of the token size. I could just input “hello” in a “fresh” session and the error would still be there.
Switching to another model fixed the issue for me

Switching to another model fixed the issue for me

I wonder if this is it?

Hey, the problem comes from a too long context provided to the model.

The error indicates that the length of your context + prompt is of length 14087. It is strange as you select top 3 docs to answer. How long are your documents when tokenized ?

In my opinion:
Create shorter chunks of your documents in the db for retrieval. Hence, you’ll provide shorter and more relevant context to the LLM to answer the question.

1 Like