Meta-Llama-3-8B-Instruct: Validation Error "Max_new_tokens"

SyedSubahani · June 6, 2024, 3:59pm

Hi,
I tried to build the QA RAG chatbot using Llama3 but it failed to give the response with the below error message. It fails at “RetrievalQA” method.

The same code works fine with the model “Mistral-7B-Instruct-v0.2”.

Any Help here is greatly appreciated.

Code:
llm = HuggingFaceEndpoint(
repo_id=llm_model,
huggingfacehub_api_token=HF_API_TOKEN,
temperature=temperature,
max_new_tokens=max_tokens,
top_k=top_k,
)

retriever = vector_db.as_retriever(search_kwargs={"k": 3})

compressor = FlashrankRerank(model="ms-marco-MiniLM-L-12-v2")

compression_retriever = ContextualCompressionRetriever(
                        base_compressor=compressor, base_retriever=retriever
                    )

qachain = RetrievalQA.from_chain_type(
                llm=llm,
                chain_type="stuff",
                retriever=compression_retriever,
                return_source_documents=True,
                chain_type_kwargs={"prompt": prompt, "verbose": False},
            )

return qachain

Error Message:
HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3-8B-Instruct Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 14087 inputs tokens and 256 max_new_tokens

arieltoledo · June 24, 2024, 12:17pm

Hi, im experience the same error. Did you find any solution yet ?

fabioceruti · October 2, 2024, 7:53am

Same issue for me here, are there any workarounds?

John6666 · October 2, 2024, 8:16am

Input validation error: inputs tokens + max_new_tokens must be <= 8192. Given: 14087 inputs tokens and 256 max_new_tokens

If this error message is correct, then the token you are giving is too long. Why don’t you try giving shorter words?
It says max_new_tokens + tokens must be less than 8192 characters combined.
But it says that actually given was over 12,000 characters in tokens alone…

fabioceruti · October 2, 2024, 8:37am

In my case, I am getting the error regardless of the token size.
If I work in a .ipynb file, I am able to correctly get outputs from the model. Yet, when I plug the model in streamlit to have a nicer visual interface, I get the error regardless of the token size. I could just input “hello” in a “fresh” session and the error would still be there.
Switching to another model fixed the issue for me

John6666 · October 2, 2024, 8:41am

Switching to another model fixed the issue for me

I wonder if this is it?

samchain · October 2, 2024, 8:55am

Hey, the problem comes from a too long context provided to the model.

The error indicates that the length of your context + prompt is of length 14087. It is strange as you select top 3 docs to answer. How long are your documents when tokenized ?

In my opinion:
Create shorter chunks of your documents in the db for retrieval. Hence, you’ll provide shorter and more relevant context to the LLM to answer the question.

Topic		Replies	Views
Validation Error: Meta-Llama-3-8B-Instruct Models	3	252	November 19, 2024
How to increase max_new_tokens beyond 1200 in code llama Models	2	756	September 25, 2024
HfHubHTTPError: 500 Server Error:Meta-Llama-3-8B-Instruct Models	0	31	August 29, 2024
LLAMA-2 Download issues Models	8	7867	November 7, 2023
ValidationError: Max token limit(>=1) reached for finetuned models Amazon SageMaker	3	725	December 28, 2023

Meta-Llama-3-8B-Instruct: Validation Error "Max_new_tokens"

Related topics