Error in generating model output: InferenceClient.chat_completion() got an unexpected keyword argument 'last_input_token_count'

MShijas · June 4, 2025, 10:05am

from smolagents import CodeAgent, InferenceClientModel

agent = CodeAgent(tools=, model=model)

alfred_agent = agent.from_hub(‘sergiopaniego/AlfredAgent’, token=hf_token, trust_remote_code=True)

alfred_agent.run(“Give me best playlist for a party at the Wayne’s mansion. The party idea is a ‘villain masquerade’ theme”)

In the above code I am using a llm model from openai in the agent instead of the Qwen model, but in the alfred_agent I am using the default hugging face provided model. When running the code I am getting an error saying "AgentGenerationError: Error in generating model output:
InferenceClient.chat_completion() got an unexpected keyword argument ‘last_input_token_count’
" and “TypeError: InferenceClient.chat_completion() got an unexpected keyword argument ‘last_input_token_count’”.

What can I do so that I can use gpt model for the llm part (usage limit with the model provided by hugginface). Can I use the agent provide by the huggingface without any issue or is there any other solution.

John6666 · June 4, 2025, 12:26pm

For example, if you want to use OpenAI’s LLM, I think you need to use OpenAIServerModel instead of InferenceClientModel.

Maybe like this.

from smolagents import CodeAgent, OpenAIServerModel

model = OpenAIServerModel(model_id="gpt-4o", api_base="https://api.openai.com/v1", api_key=os.getenv("OPENAI_API_KEY", None))
agent = CodeAgent(tools=[], model=model)

Pimpcat-AU · June 10, 2025, 7:58pm

llm = HuggingFaceEndpoint(
repo_id=huggingface_repo_id,
temperature=0.5,
model_kwargs={}, # do not use “max_length” here
)

If you need to set output length, use “max_new_tokens” or the official config method.

Solution provided by Triskel Data Deterministic AI.

Topic		Replies	Views
Ai Agents course error in running the Smolagent example Course	14	905	June 2, 2025
Agent wont respond Beginners	6	329	April 26, 2025
TypeError: InferenceClient.text_generation() got an unexpected keyword argument 'token' Beginners	5	46	June 10, 2025
SmolAgents: Try to run Agent with local model (mistral) Beginners	3	401	March 24, 2025
Modelerror when deploying openchat3.5 Amazon SageMaker	0	223	April 2, 2024

Error in generating model output: InferenceClient.chat_completion() got an unexpected keyword argument 'last_input_token_count'

If you need to set output length, use “max_new_tokens” or the official config method.

Related topics