What are the standard Function formats called?

There seems to be a lot of work happening to fine-tune LLMs to be able support functions. Are there standard formats? What are they called?

For instance, OpenAI was fine-tuned to enable functions with a specific JSON format, i.e…

Functions are enqueued like so…
{ role: “assistant”,
function_call: { name: “funcName”, arguments: {} }
}
and the results are passed like so…
{ role: “function”,
content: { hello: “world” }
}

Does this schema have a name? Are there others?

1 Like

Yes, I know what JSON is.

I’m trying to figure out if (and what) schemas are being trained so I can build a generic client to execute functions. If you haven’t worked with streaming data, it’s important to know the schema in advance because it doesn’t come through as a single entity. You’re basically marshalling a response.

It’s possible that I don’t understand the question so feel free to correct me. It helps if the inquiry is clear and understood.

I do think you’re confusing the Schema with the RFC Protocol. If you’re looking for something more specific, then it’s just ChatML.

The template is just the sequence that’s used for Pre-training and/or Fine-tuning. So you’re actually asking multiple questions at once.

Technically, it’s just BNF Grammar, but the model is trained on this and learns it. It’s just a matter of tuning the model so it can respond accordingly. The backend API is responsible for managing the call. The frontend API is simply managing the response.

You can always coerce the model to respond in the JSON schema format as well.

This is why the OpenAI API response looks like the following:

{
  "id": "chatcmpl-87bRKWW823bhsr6OcrxM2UyCFxPAV",
  "object": "chat.completion",
  "created": 1696822906,
  "model": "gpt-3.5-turbo-0613",
  "choices": [
    {
      "index": 0,
      "message": {
        "role": "assistant",
        "content": null,
        "function_call": {
          "name": "get_current_weather",
          "arguments": "{\n  \"location\": \"New York City, NY\"\n}"
        }
      },
      "finish_reason": "function_call"
    }
  ],
  "usage": {
    "prompt_tokens": 132,
    "completion_tokens": 20,
    "total_tokens": 152
  }
}

Regardless, it’s just JSON and the JSON Schema with some clever fine-tuning and programming.

Thanks. This is starting to get at my question.

The model produces a result in ChatML, which is then transformed into JSON (or whatever) when the handler builds the response. That makes sense. So, the answer to my question is no. The response payload will always be platform specific bc the platform is responsible for that.

I appreciate you explaining that. I’m going to read up on it. I wonder how the function arguments get formed. The markup must be recursive somehow.

That’s what I wondered about for awhile as well. The model actually responds in JSON Schema.

Allow me to further clarify what’s happening for you.

Technically, GPT’s response in the provided example, is actually the following.

{
  "name": "get_current_weather",
  "arguments": "{\n  \"location\": \"New York City, NY\"\n}"
}

This is the models response. The GPT model is fine-tuned to respond accordingly given a context that includes function definitions that are in JSON schema.

The backend client then handles how the response is propogated. In a sever, when streaming via SSE, you’d get “chunks” of JSON responses until a stop token is received by the client.

The template the model is trained on is an entirely different story. This a bit more complicated and nuanced than it initially seems.

The Vocabulary is usually based on BPE which affects the special tokens, e.g. BOS, EOS, UNK, PAD, etc. Then you have the special tokens for the chat template, e.g. ChatML.

So, if you were interacting with the model directly, no layers inbetween other than a simple CLI interface, you’d see something like:

<|im_start|>system
My name is ChatGPT. I am a helpful assistant. I will only respond with the functions given to me.<|im_end|>
# function definitions are handled by the trainer/developer
<|im_start|>user
How is the weather today?<|im_end|>
<|im_start|>assistant
{
  "function_call": {
    "name": "get_current_weather",
    "arguments": "{\n  \"location\": \"New York City, NY\"\n}"
  }
}

I haven’t gotten past rationalizing this part yet. Still working on it. If you find out how this is done, I’m open to feedback.

It feels like there is another process behind the scenes.

I get that the model can respond in pure JSON, and maybe it’s easier, but wouldn’t that invariably lead to malformed responses, like how humans sometimes use camelCase when snake_case is needed by an API. It seems fundamentally easier to tell it what the structure is then have it fill in the blanks.

Although, since you can easily use a computer to generate & validate petabytes of properly formed & malformed examples, then it might be easy to tune it.

I’m going to look into it. Do you know of any open models that have functions?

It’s a sequence of steps all culminating up to the function call. Like I said, you could also coerce the model into a function call using BNF.

I haven’t used any of the function calling models my self yet, but I’ve been keeping an eye out: