Function calling not working with inference clients on (seemingly) any model

I’m attempting to use function-calling capabilities with text inference models, but none of the ways I’ve attempted to do with the inference clients work. I use OpenAI’s function calling regularly, but Hugging Face capabilities seem fragile and unpredictable, but I suspect I’m just “doing it wrong.”

I’d appreciate any insights into the following cases I’m seeing:

JavaScript inference client fails

If I just include the first message (e.g. “What’s the weather in X place?”), the models will properly return the function call required to finish the completion. But when including the subsequent messages, typically required for OpenAI, all models I try fail thusly:

Mistral models error out claiming tool_call_ids should be 9-character alphanumeric strings. Even when they are, it still gives the same error.

The Qwen model (and others) error out with “An error fetching the blob.”

And still yet, some Llama models will just return the original tool call message again, resulting in an infinite loop if I don’t catch it.

Python inference client fails

Similarly, the same code in Python fails. Again, Mistral models error out with the 9-character claim, even though the tool_call_id is indeed a 9-char alphanumeric.

Other models fail by not returning a final completion, but just repeat the tool_call message.

Again, any insights you’d have I’d appreciate!

1 Like
1 Like

@Alanturner2 I don’t think it’s a token or authentication problem for this function calling. Was that comment intended to help here as well? I have a token with the proper permissions and all that needs to happen to reproduce behavior is setting the HF_TOKEN variable.

I tried it twice with Qwen and twice with Zephyr. I wonder what the expected behavior is…

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590766, id='', model='Qwen/Qwen2.5-72B-Instruct', system_fingerprint='3.0.2-sha-b70f29d', usage=ChatCompletionOutputUsage(completion_tokens=30, prompt_tokens=312, total_tokens=342))   
ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590769, id='', model='Qwen/Qwen2.5-72B-Instruct', system_fingerprint='3.0.2-sha-b70f29d', usage=ChatCompletionOutputUsage(completion_tokens=30, prompt_tokens=339, total_tokens=369)) 

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(fu
nction=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino (Italy)'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590905, id='', model='HuggingFaceH4/zephyr-7b-beta', system_fingerprint='3.0.1-sha-bb9095a', usage=ChatCompletionOutputUsage(completion_tokens=41, prompt_tokens=300, total_tokens=341))
ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590906, id='', model='HuggingFaceH4/zephyr-7b-beta', system_fingerprint='3.0.1-sha-bb9095a', usage=ChatCompletionOutputUsage(completion_tokens=40, prompt_tokens=101, total_tokens=141))

Glad to see someone can reproduce it! With OpenAI at least, the expected behavior is to get an “assistant” message with a text content that presents a human-readable response, like “The temperature is 4ºC.”

Thanks for the GitHub link; I’ll look through it. I opened a similar bug in the inference client repo; HF is a big place so I don’t know if that was the right repo.

Even within the InferenceClient class, chat_completion is quite special because it is modeled on the OpenAI API. For now, the return value needs to be plain like this.

output = client.chat_completion(messages=messages, tools=tools, max_tokens=500, temperature=0.3)
print(output.choices[0].message.tool_calls[0].function)
print(output.choices[0].message.content)

output = client.chat_completion(messages=messages, max_tokens=500, temperature=0.3)
print(output.choices[0].message.content)
ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None)

None

The current temperature in San Giustino, Italy, is 22.0 degrees Celsius. However, please note that this information might change depending on the time of day and season. For the most accurate and up-to-date weather information, you can check a reliable weather website or app.

It’s probably buggy.

Hmm, very interesting. So one omits the “tools=tools” once manually determining the calls are satisfied…

I’m quite familiar with the OpenAI API, and there are some quirks in HF that I’m obviously stumbling through. For example, in OpenAI, the tools array can always be provided, in case subsequent function calls are needed based on the result.

But I think I can work through that quirk for the time being. Thanks @John6666!

1 Like

Sorry, I was wrong. I think the Inference API is buggy.
The function is defined correctly. When that happens, None is returned.

Edit:
I reported the symptoms on Discord.

1 Like