Function calling not working with inference clients on (seemingly) any model

anomalus · February 2, 2025, 3:31pm

I’m attempting to use function-calling capabilities with text inference models, but none of the ways I’ve attempted to do with the inference clients work. I use OpenAI’s function calling regularly, but Hugging Face capabilities seem fragile and unpredictable, but I suspect I’m just “doing it wrong.”

I’d appreciate any insights into the following cases I’m seeing:

JavaScript inference client fails

If I just include the first message (e.g. “What’s the weather in X place?”), the models will properly return the function call required to finish the completion. But when including the subsequent messages, typically required for OpenAI, all models I try fail thusly:

Mistral models error out claiming tool_call_ids should be 9-character alphanumeric strings. Even when they are, it still gives the same error.

The Qwen model (and others) error out with “An error fetching the blob.”

And still yet, some Llama models will just return the original tool call message again, resulting in an infinite loop if I don’t catch it.

gist.github.com

https://gist.github.com/awmartin/d13a6c7369a61acc564343925061ec7c

inference-tool-calls.js

const { HfInference } = require("@huggingface/inference");

const inference = new HfInference(HF_TOKEN);

async function main () {
  const tool_call_id = "9Ae3bDc2F"  // Random ID, 9 alphanumeric characters
  
  const out = await inference.chatCompletion({
    // Mistral models error out claiming tool_call_ids should be 9-char alphanumerics, even though they are.
    // model: "mistralai/Mistral-7B-Instruct-v0.3",

This file has been truncated. show original

Python inference client fails

Similarly, the same code in Python fails. Again, Mistral models error out with the 9-character claim, even though the tool_call_id is indeed a 9-char alphanumeric.

Other models fail by not returning a final completion, but just repeat the tool_call message.

gist.github.com

https://gist.github.com/awmartin/c64c84fbbdc3a9f0c2ce6e5ae0dab3dc

inference-tool-calls.py

from huggingface_hub import InferenceClient

client = InferenceClient(
    api_key=HF_TOKEN
)

tool_call_id = "9Ae3bDc2F"

messages = [
    {

This file has been truncated. show original

Again, any insights you’d have I’d appreciate!

Alanturner2 · February 3, 2025, 7:17am

anomalus · February 3, 2025, 1:44pm

@Alanturner2 I don’t think it’s a token or authentication problem for this function calling. Was that comment intended to help here as well? I have a token with the proper permissions and all that needs to happen to reproduce behavior is setting the HF_TOKEN variable.

John6666 · February 3, 2025, 2:00pm

I tried it twice with Qwen and twice with Zephyr. I wonder what the expected behavior is…

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590766, id='', model='Qwen/Qwen2.5-72B-Instruct', system_fingerprint='3.0.2-sha-b70f29d', usage=ChatCompletionOutputUsage(completion_tokens=30, prompt_tokens=312, total_tokens=342))   
ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590769, id='', model='Qwen/Qwen2.5-72B-Instruct', system_fingerprint='3.0.2-sha-b70f29d', usage=ChatCompletionOutputUsage(completion_tokens=30, prompt_tokens=339, total_tokens=369)) 

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(fu
nction=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino (Italy)'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590905, id='', model='HuggingFaceH4/zephyr-7b-beta', system_fingerprint='3.0.1-sha-bb9095a', usage=ChatCompletionOutputUsage(completion_tokens=41, prompt_tokens=300, total_tokens=341))
ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], created=1738590906, id='', model='HuggingFaceH4/zephyr-7b-beta', system_fingerprint='3.0.1-sha-bb9095a', usage=ChatCompletionOutputUsage(completion_tokens=40, prompt_tokens=101, total_tokens=141))

github.com/huggingface/huggingface_hub

Passing tool results to the LM

opened 04:53PM - 15 Oct 24 UTC

closed 11:46AM - 20 Jan 25 UTC

anakin87

bug

### Describe the bug Let's start by thanking you for this great resource :blu…e_heart: The `InferenceClient` supports tool calling, as explained [here](https://huggingface.co/docs/huggingface_hub/en/package_reference/inference_client#huggingface_hub.InferenceClient.chat_completion.example-4). In many use cases, **it is useful to pass back the tool call to the Language Model and also the tool result in a message from `tool` role**. In this way, the LM can for example respond in a human-readable way. This is supported in [HF Transformers](https://huggingface.co/docs/transformers/main/chat_templating#a-complete-tool-use-example). When using the `InferenceClient` (for Serverless Inference API or TGI), I'm struggling to find a way to reproduce this desired behavior. (I mostly experimented with Mistral and Llama models supporting tool/function calling, with similar results) @Wauplin @hanouticelina Is this supported or planned? Is there any workaround you suggest? So far, I've only tried to wrap the tool result in a message from `user` and this somehow works... Probably related issue (in TGI): https://github.com/huggingface/text-generation-inference/issues/2461 ### Reproduction ```python from huggingface_hub import InferenceClient client = InferenceClient("mistralai/Mistral-Nemo-Instruct-2407") messages = [ { "role": "system", "content": "Don't make assumptions about what values to plug into functions. Ask for clarification if a user request is ambiguous.", }, { "role": "user", "content": "What's the weather like in San Giustino (Italy) in Celsius?", }, ] tools = [ { "type": "function", "function": { "name": "get_current_weather", "description": "Get the current weather", "parameters": { "type": "object", "properties": { "location": { "type": "string", "description": "The city and state, e.g. San Francisco, CA", }, "format": { "type": "string", "enum": ["celsius", "fahrenheit"], "description": "The temperature unit to use. Infer this from the users location.", }, }, "required": ["location", "format"], }, }, }] client.chat_completion(messages=messages, tools=tools, max_tokens=500, temperature=0.3) # this works great and produces a similar output: # ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None), id='0', type='function')]), logprobs=None)], ...) # TRYING TO PASS BACK TOOL CALLS AND TOOL RESULT new_messages = [el for el in messages] id_ = "9Ae3bDc2F" # fake ID needed to use Mistral models tool_call = {"name": "get_current_weather", "arguments": {"location": "San Giustino, Italy", "format": "celsius"}} new_messages.append({"role": "assistant", "content":"", "tool_calls": [{"type": "function", "function": tool_call, "id": id_}]}) new_messages.append({"role": "tool", "name": "get_current_temperature", "content": "22.0", "tool_call_id": id_}) client.chat_completion(messages=new_messages, tools=tools, max_tokens=500, temperature=0.3) # HfHubHTTPError: 422 Client Error: Unprocessable Entity for url: https://api-inference.huggingface.co/models/mistralai/Mistral-Nemo-Instruct-2407/v1/chat/completions (Request ID: ...) # Template error: unknown filter: filter string is unknown (in <string>:79) ``` ### System info ```shell - huggingface_hub version: 0.25.2 ```

anomalus · February 3, 2025, 2:23pm

Glad to see someone can reproduce it! With OpenAI at least, the expected behavior is to get an “assistant” message with a text content that presents a human-readable response, like “The temperature is 4ºC.”

Thanks for the GitHub link; I’ll look through it. I opened a similar bug in the inference client repo; HF is a big place so I don’t know if that was the right repo.

John6666 · February 3, 2025, 3:02pm

Even within the InferenceClient class, chat_completion is quite special because it is modeled on the OpenAI API. For now, the return value needs to be plain like this.

output = client.chat_completion(messages=messages, tools=tools, max_tokens=500, temperature=0.3)
print(output.choices[0].message.tool_calls[0].function)
print(output.choices[0].message.content)

output = client.chat_completion(messages=messages, max_tokens=500, temperature=0.3)
print(output.choices[0].message.content)

ChatCompletionOutputFunctionDefinition(arguments={'format': 'celsius', 'location': 'San Giustino, Italy'}, name='get_current_weather', description=None)

None

The current temperature in San Giustino, Italy, is 22.0 degrees Celsius. However, please note that this information might change depending on the time of day and season. For the most accurate and up-to-date weather information, you can check a reliable weather website or app.

It’s probably buggy.

anomalus · February 3, 2025, 3:18pm

Hmm, very interesting. So one omits the “tools=tools” once manually determining the calls are satisfied…

I’m quite familiar with the OpenAI API, and there are some quirks in HF that I’m obviously stumbling through. For example, in OpenAI, the tools array can always be provided, in case subsequent function calls are needed based on the result.

But I think I can work through that quirk for the time being. Thanks @John6666!

John6666 · February 3, 2025, 3:19pm

Sorry, I was wrong. I think the Inference API is buggy.
The function is defined correctly. When that happens, None is returned.

Edit:
I reported the symptoms on Discord.

eight0805 · February 7, 2025, 8:52pm

Is there any update on this issue, I am facing the same issue and trying to find a walkaround.

John6666 · February 7, 2025, 9:00pm

So far, there’s been no progress.

anomalus · February 8, 2025, 5:02pm

I’ve had some acknowledgement on the GitHub issue I filed.

I’ve tried a workaround that entails omitting the “tools” argument on subsequent instantiations of the chat completion object, and it seems to work for now, at least for simple tool situations.

Vision models also fail in weird ways, when sending messages with an image_url specified. The workaround is the same.

Both of these situations require special hacky logic for HF inference, like this.

Topic		Replies	Views
sentence-transformers/all-MiniLM-L6-v2 Not working all of a sudden Beginners	9	147	May 8, 2025
Are InferenceClient()'s down? Beginners	10	269	July 3, 2025
Function/tool calling using Transformer models 🤗Transformers	3	667	February 24, 2025
404 to any API i tried Beginners	5	77	July 7, 2025
TypeError: InferenceClient.text_generation() got an unexpected keyword argument Beginners	1	39	June 28, 2025

Function calling not working with inference clients on (seemingly) any model

JavaScript inference client fails

Python inference client fails

Related topics