How can I get the logits from an endpoint call?

I’m attempting to do a query similar to this one using the Huggingface inference endpoints.

api_url = 'https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-70B-Instruct'
headers = {'Authorization': f'Bearer {token}'}
response = requests.post(api_url, headers = headers, json = {'inputs': 'What is the capital of France? The capital of France is : ')

I’m not just looking for the answer, but also for the logits of the generated search: I want to be able to calculate the probability of getting a certain answer.

I can do this with AutoModelForCausalInference, but most big models don’t fit my GPUs (and a HF Pro subscription is cheaper than another A100).

Is there any way to use the API this way?

1 Like

Hi,

According to Detailed parameters, the serverless inference API does not support returning logits.

If you want that, you could define a custom handler on Inference Endpoints which also returns logits besides text.

2 Likes

Just checking: can I use the API calls in inference endpoints with logits as part of an experiment I’m doing in my local computer, or do I have to use gradio, Spaces, or some other library like that?

+1 I’d be interested in an easy/standard way to do this as well.