Inference Client chat completion parameter logit_bias not working

taylorj94 · December 26, 2024, 12:18pm

According to the docs, the logit_bias parameter for the chat_completion function expects a “JSON object that maps tokens (specified by their token ID in the tokenizer) to an associated bias value from -100 to 100”. The type annotation, however says that it should be an Optional[List[float]].

Indeed, if I try to pass in a dictionary, e.g.

completion = client.chat_completion( model="meta-llama/Llama-3.3-70B-Instruct", messages=messages, max_tokens=100, logit_bias={100: 4} )

I get an HTTPError: 422 Client Error: Unprocessable Entity for url error. I can pass in a list of floats, but have no idea how this is supposed to encode logit biases without a mapping.

Reproduction

from huggingface_hub import InferenceClient

client = InferenceClient(api_key="hf_xxx")

messages = [
	{
		"role": "user",
		"content": "The capital of France is"
	}
]

completion = client.chat_completion(
    model="meta-llama/Llama-3.3-70B-Instruct", 
	  messages=messages,
	  max_tokens=20,
    logit_bias={100: 4}
)

print(completion.choices[0].message)

mahmutc · December 26, 2024, 1:37pm

hi @taylorj94
There is a clearer error when you run the code:

Failed to deserialize the JSON body into the target type: logit_bias: invalid type: map, expected a sequence at line 1 column 137

But I have no idea how to deal without a mapping.

mahmutc · December 26, 2024, 3:45pm

I’m not sure what these tests indicate, but I believe you could provide a list that is as long as the size of vocabulary.

from huggingface_hub import InferenceClient
client = InferenceClient(api_key="hf_xxx")
messages = [
	{
		"role": "user",
		"content": "The capital of France is"
	}
]
completion = client.chat_completion(
    model="microsoft/Phi-3-mini-4k-instruct", 
	  messages=messages,
	  max_tokens=20,
	  logit_bias=30000*[-100]
)
completion

generated
ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content='Paris', tool_calls=None), logprobs=None)], created=1735226825, id='', model='microsoft/Phi-3-mini-4k-instruct', usage=ChatCompletionOutputUsage(completion_tokens=2, prompt_tokens=8, total_tokens=10))


from huggingface_hub import InferenceClient
client = InferenceClient(api_key="hf_xxx")
messages = [
	{
		"role": "user",
		"content": "The capital of France is"
	}
]
completion = client.chat_completion(
    model="microsoft/Phi-3-mini-4k-instruct", 
	  messages=messages,
	  max_tokens=20,
	  logit_bias=30000*[100]
)
completion

generated

ChatCompletionOutput(choices=[ChatCompletionOutputComplete(finish_reason='stop', index=0, message=ChatCompletionOutputMessage(role='assistant', content='The capital of France is Paris.', tool_calls=None), logprobs=None)], created=1735227061, id='', model='microsoft/Phi-3-mini-4k-instruct', usage=ChatCompletionOutputUsage(completion_tokens=8, prompt_tokens=8, total_tokens=16))

content=‘Paris’ vs content=‘The capital of France is Paris.’

I guess you can create a list with len(vocab) * [100] and assign a value of -100 to some of the tokens.

Topic		Replies	Views
Dedicated endpoint not matching OpenAI specification Inference Endpoints on the Hub	0	92	July 10, 2024
Inference API error Beginners	0	760	June 12, 2023
How to pass arguments to model when using InferenceClient Inference Endpoints on the Hub	2	112	September 12, 2024
Models with inference_client Beginners	1	61	January 28, 2025
Feature extraction with Inference API in Javascript Beginners	5	401	October 7, 2024

Inference Client chat completion parameter logit_bias not working

Reproduction

Related topics