Tool calling gets stuck in an infinite loop

hadleywickham · April 10, 2025, 12:55pm

I’m having problems using tool calling with the inference endpoint. I’m following the documentation (such as it is), and the examples in Tool Use, Unified. But the API doesn’t seem to recognise that I’ve returned a result, and just returns another tool call request:

{
  "messages": [
    {
      "role": "system",
      "content": "Be very terse, not even punctuation."
    },
    {
      "role": "user",
      "content": [
        {
          "type": "text",
          "text": "What's the current date in Y-M-D format?"
        }
      ]
    },
    {
      "role": "assistant",
      "tool_calls": [
        {
          "id": "0",
          "function": {
            "name": "tool_001",
            "arguments": "{}"
          },
          "type": "function"
        }
      ]
    },
    {
      "role": "tool",
      "content": "2024-01-01",
      "name": "tool_001",
      "tool_call_id": "0"
    },
    {
      "role": "assistant",
      "tool_calls": [
        {
          "id": "0",
          "function": {
            "name": "tool_001",
            "arguments": "{}"
          },
          "type": "function"
        }
      ]
    },
    {
      "role": "tool",
      "content": "2024-01-01",
      "name": "tool_001",
      "tool_call_id": "0"
    }
  ],
  "model": "meta-llama/Llama-3.1-8B-Instruct",
  "stream": false,
  "tools": [
    {
      "type": "function",
      "function": {
        "name": "tool_001",
        "description": "Return the current date",
        "strict": true,
        "parameters": {
          "type": "object",
          "description": "",
          "properties": {},
          "required": [],
          "additionalProperties": false
        }
      }
    }
  ]
}```

Any ideas what I'm doing wrong?

John6666 · April 11, 2025, 4:12am

There is probably something wrong with the function calling in all models. It may be a bug in TGI.

github.com/huggingface/text-generation-inference

Function/tool calling never resolves

opened 06:23PM - 02 Feb 25 UTC

awmartin

### Description When using the inference client with function calling, models s…eem to never resolve their calls. As we know, typically, with the OpenAI pattern, the simplest function/tool call is a series of messages of various roles (system, user, assistant, tool) organized like this: system → user ("what's the weather?") → assistant (tool_calls) → tool (result: "4ºC") → assistant (content: "it's 4ºC") The HF docs seem to indicate this is the same pattern, although the messages have some minor differences (e.g. description: null, which never happens with OpenAI). When using the Python inference client, these tool_calls never resolve even after functions are called and their return values are included and seemingly properly referenced. Instead, they look like this: system → user ("what's the weather?") → assistant (tool_calls) → tool (result: "4ºC") → assistant (tool_calls) … Instead of returning a text completion, the HF inference client returns the same "assistant" message specifying a required tool_calls. In OpenAI, they resolve to a typical "assistant" message with token content if the function calls have been satisfied and no further calls are required. Models used that exhibit this behavior: - NousResearch/Hermes-3-Llama-3.1-8B - Qwen/Qwen2.5-72B-Instruct - meta-llama/Meta-Llama-3-8B-Instruct --- It's worth noting that Mistral models also error out, specifying that a 9-character alphanumeric string is required for the `tool_call_id`. Now, the models themselves don't provide such IDs, so we need to supply them ourselves. But even when doing so, the same error occurs, that 9-char identifiers are missing. (e.g. mistralai/Mistral-7B-Instruct-v0.3) The JavaScript client also fails with the above errors, and also a third: "An error occurred while fetching the blob". ### System Info - macOS 15.2 - Python 3.13.1 - huggingface_hub 0.28.1 ### Information - [ ] Docker - [x] The CLI directly ### Tasks - [x] An officially supported command - [ ] My own modifications ### Reproduction Gist of sample error code is here: https://gist.github.com/awmartin/c64c84fbbdc3a9f0c2ce6e5ae0dab3dc 1. Provide API token 2. python inference-tool-calls.py A message results that's unexpected. I expected this to be a typical message with a string content, something like, "It's 4 degrees today." Instead, it just repeats the assistant message with the original tool_call message: [ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content=None, tool_calls=[ChatCompletionOutputToolCall(function=ChatCompletionOutputFunctionDefinition(arguments={'unit': 'Celsius', 'location': 'Philadelphia, PA, US'}, name='get_current_temperature', description=None), id='0', type='function')]), logprobs=None)] ### Expected behavior I expected a message that resolved to something similar to "It's 4 degrees Celsius today" rather than the tool_call message repeated.

hadleywickham · April 12, 2025, 12:33pm

Thank you!

Topic		Replies	Views
Function calling not working with inference clients on (seemingly) any model Beginners	10	539	February 8, 2025
Handler.py not executed in Inference Endpoint Inference Endpoints on the Hub	0	265	September 13, 2023
Dedicated endpoint stuck at Initializing Inference Endpoints on the Hub	4	280	July 8, 2024
Llama2 tools instruction wierd reponse Intermediate	2	157	May 8, 2024
Inference API returns 504 error for Llama-3.2-3B-Instruct & google/gemma-2-2b-it Inference Endpoints on the Hub	3	33	April 21, 2025

Tool calling gets stuck in an infinite loop

Related topics