TypeError: InferenceClient.text_generation() got an unexpected keyword argument 'token'

Pimpcat-AU · June 10, 2025, 7:35pm

import os

from langchain_huggingface import HuggingFaceEndpoint
from langchain_core.prompts import PromptTemplate
from langchain.chains import RetrievalQA
from langchain_huggingface import HuggingFaceEmbeddings
from langchain_community.vectorstores import FAISS
from langchain_community.document_loaders import PyPDFLoader, DirectoryLoader

Step 1: Setup LLM (Mistral with HuggingFace)

HF_TOKEN = os.environ.get(“HF_TOKEN”)
HUGGINGFACE_REPO_ID = “mistralai/Mistral-7B-Instruct-v0.3”

def load_llm(huggingface_repo_id):
# DO NOT include “token” in model_kwargs
# Pass the token via environment variable or as a supported class parameter
os.environ[“HUGGINGFACEHUB_API_TOKEN”] = HF_TOKEN
llm = HuggingFaceEndpoint(
repo_id=huggingface_repo_id,
temperature=0.5,
model_kwargs={“max_length”: 512}
# Optionally, for some langchain versions:
# huggingfacehub_api_token=HF_TOKEN
)
return llm

Step 2: Connect LLM with FAISS and Create chain

CUSTOM_PROMPT_TEMPLATE = “”"
Use the pieces of information provided in the context to answer user’s question.
If you don’t know the answer, just say that you don’t know; don’t try to make up an answer.
Don’t provide anything out of the given context.

Context: {context}
Question: {question}

Start the answer directly. No small talk please.
“”"

def set_custom_prompt(custom_prompt_template):
prompt = PromptTemplate(template=custom_prompt_template, input_variables=[“context”, “question”])
return prompt

Load Database

DB_FAISS_PATH = “vectorstore/db_faiss”
embedding_model = HuggingFaceEmbeddings(model_name=“sentence-transformers/all-MiniLM-L6-v2”)
db = FAISS.load_local(DB_FAISS_PATH, embedding_model, allow_dangerous_deserialization=True)

Create QA chain

qa_chain = RetrievalQA.from_chain_type(
llm=load_llm(HUGGINGFACE_REPO_ID),
chain_type=“stuff”,
retriever=db.as_retriever(search_kwargs={“k”: 3}),
return_source_documents=True,
chain_type_kwargs={“prompt”: set_custom_prompt(CUSTOM_PROMPT_TEMPLATE)}
)

Now invoke with a single query

user_query = input("Write Query Here: ")
response = qa_chain.invoke({‘query’: user_query})
print("RESULT: ", response[“result”])
print("SOURCE DOCUMENTS: ", response[“source_documents”])

Solution provided by Triskel Data Deterministic AI.

Topic		Replies	Views
Number of tokens (550) exceeded maximum context length (512) error Beginners	2	273	September 12, 2024
I'm having an error message working with my User access tokens Inference Endpoints on the Hub	19	12495	June 6, 2025
Facing issue using a model hosted on HuggingFace Server and talking to it using API_KEY Beginners	7	219	May 12, 2025
Error in generating model output: InferenceClient.chat_completion() got an unexpected keyword argument 'last_input_token_count' Models	2	45	June 10, 2025
sentence-transformers/all-MiniLM-L6-v2 Not working all of a sudden Beginners	9	124	May 8, 2025