Hello, I was wondering to which extent the probabilities for tool use can be extracted using the approach described in this thread.
I am using a simple tool call to the Cohere Command R+ model. The two tools that are being passed to the model are: 1) a web search and 2) a call to another LLM. Whenever I prompt it with a knowledge question such as âWhat is the biggest penguinâ, the output is a tool selection and the query parameters for an internet search. In this case, the probabilities for each token are 100% except for two tokens of the input query. My assumption would be that whenever I prompt the LLM with a message that can not be solved with the available tools (and the default âdirectly answerâ which uses the LLM to answer the query) the confidence would be significantly lower.
However, if I add 2 tools for multiplying and summing integers and prompt with the query âwhat is the temperature for todayâ I still get high confidence for the âdirectly answerâ action. Is my assumption wrong that the probabilities as defined above cannot be used to estimate the probabilities for a tool call?
Token output with probabilities
token | token string | logits | probability |
---|---|---|---|
9814 | Action | 0.0000 | 100.00% |
33 | : | 0.0000 | 100.00% |
15080 | ``` | 0.0000 | 100.00% |
6329 | json | 0.0000 | 100.00% |
206 | 0.0000 | 100.00% | |
66 | [ | 0.0000 | 100.00% |
1856 | 0.0000 | 100.00% | |
1936 | { | 0.0000 | 100.00% |
1890 | 0.0000 | 100.00% | |
1789 | " | 0.0000 | 100.00% |
22018 | tool | 0.0000 | 100.00% |
70 | _ | 0.0000 | 100.00% |
2769 | name | 0.0000 | 100.00% |
2209 | ": | 0.0000 | 100.00% |
1789 | " | 0.0000 | 100.00% |
6903 | web | -0.0003 | 99.97% |
70 | _ | 0.0000 | 100.00% |
9363 | search | 0.0000 | 100.00% |
2040 | ", | 0.0000 | 100.00% |
1890 | 0.0000 | 100.00% | |
1789 | " | 0.0000 | 100.00% |
21508 | parameters | 0.0000 | 100.00% |
2209 | ": | 0.0000 | 100.00% |
1936 | { | 0.0000 | 100.00% |
2087 | 0.0000 | 100.00% | |
1789 | " | 0.0000 | 100.00% |
8417 | query | 0.0000 | 100.00% |
2209 | ": | 0.0000 | 100.00% |
1789 | " | 0.0000 | 100.00% |
214226 | biggest | -0.8203 | 44.03% |
211829 | penguin | -0.0036 | 99.64% |
7754 | species | 0.0000 | 100.00% |
9 | " | 0.0000 | 100.00% |
1890 | 0.0000 | 100.00% | |
2046 | } | 0.0000 | 100.00% |
1856 | 0.0000 | 100.00% | |
2046 | } | 0.0000 | 100.00% |
206 | 0.0000 | 100.00% | |
68 | ] | 0.0000 | 100.00% |
206 | 0.0000 | 100.00% | |
3802 | ``` | 0.0000 | 100.00% |
255001 | 0.0000 | 100.00% |
Code snippet
The code snippet that i use can be found below:
from transformers import AutoTokenizer, AutoModelForCausalLM
import numpy as np
model_id = "CohereForAI/c4ai-command-r-plus-4bit"
tokenizer = AutoTokenizer.from_pretrained(model_id, token="HF_TOKEN")
model = AutoModelForCausalLM.from_pretrained(model_id,
device_map="auto",
token="HF_TOKEN")
# Format message with the command-r tool use template
conversation = [
{"role": "user", "content": "What is the biggest penguin?"}
]
# Define tools available for the model to use:
tools = [
{
"name": "internet_search",
"description": "Returns a list of relevant document snippets for a textual query retrieved from the internet",
"parameter_definitions": {
"query": {
"description": "Query to search the internet with",
"type": 'str',
"required": True
}
}
},
{
'name': "directly_answer",
"description": "Calls a standard (un-augmented) AI chatbot to generate a response given the conversation history",
'parameter_definitions': {}
}
]
formatted_input = tokenizer.apply_tool_use_template(conversation, tools=tools, tokenize=True, add_generation_prompt=True, return_tensors="pt")
outputs = model.generate(
formatted_input,
max_new_tokens=100,
do_sample=True,
temperature=0.3,
return_dict_in_generate=True,
output_scores=True
)
transition_scores = model.compute_transition_scores(
outputs.sequences, outputs.scores, normalize_logits=True)
input_length = formatted_input.shape[1]
generated_tokens = outputs.sequences[:, input_length:]
for tok, score in zip(generated_tokens[0], transition_scores[0]):
# | token | token string | logits | probability
print(f"| {tok:5d} | {tokenizer.decode(tok):8s} | {score.numpy():.4f} | {np.exp(score.numpy()):.2%}")
Edit: formatted table