That’s what I wondered about for awhile as well. The model actually responds in JSON Schema.
Allow me to further clarify what’s happening for you.
Technically, GPT’s response in the provided example, is actually the following.
{
"name": "get_current_weather",
"arguments": "{\n \"location\": \"New York City, NY\"\n}"
}
This is the models response. The GPT model is fine-tuned to respond accordingly given a context that includes function definitions that are in JSON schema.
The backend client then handles how the response is propogated. In a sever, when streaming via SSE, you’d get “chunks” of JSON responses until a stop token is received by the client.
The template the model is trained on is an entirely different story. This a bit more complicated and nuanced than it initially seems.
The Vocabulary is usually based on BPE which affects the special tokens, e.g. BOS, EOS, UNK, PAD, etc. Then you have the special tokens for the chat template, e.g. ChatML.
So, if you were interacting with the model directly, no layers inbetween other than a simple CLI interface, you’d see something like:
<|im_start|>system
My name is ChatGPT. I am a helpful assistant. I will only respond with the functions given to me.<|im_end|>
# function definitions are handled by the trainer/developer
<|im_start|>user
How is the weather today?<|im_end|>
<|im_start|>assistant
{
"function_call": {
"name": "get_current_weather",
"arguments": "{\n \"location\": \"New York City, NY\"\n}"
}
}
I haven’t gotten past rationalizing this part yet. Still working on it. If you find out how this is done, I’m open to feedback.