According to the docs, the logit_bias
parameter for the chat_completion
function expects a “JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100”. The type annotation, however says that it should be an Optional[List[float]].
Indeed, if I try to pass in a dictionary, e.g.
completion = client.chat_completion( model="meta-llama/Llama-3.3-70B-Instruct", messages=messages, max_tokens=100, logit_bias={100: 4} )
I get an HTTPError: 422 Client Error: Unprocessable Entity for url
error. I can pass in a list of floats, but have no idea how this is supposed to encode logit biases without a mapping.
Reproduction
from huggingface_hub import InferenceClient
client = InferenceClient(api_key="hf_xxx")
messages = [
{
"role": "user",
"content": "The capital of France is"
}
]
completion = client.chat_completion(
model="meta-llama/Llama-3.3-70B-Instruct",
messages=messages,
max_tokens=20,
logit_bias={100: 4}
)
print(completion.choices[0].message)