I’m attempting to do a query similar to this one using the Huggingface inference endpoints.
api_url = 'https://api-inference.huggingface.co/models/meta-llama/Meta-Llama-3.1-70B-Instruct'
headers = {'Authorization': f'Bearer {token}'}
response = requests.post(api_url, headers = headers, json = {'inputs': 'What is the capital of France? The capital of France is : ')
I’m not just looking for the answer, but also for the logits of the generated search: I want to be able to calculate the probability of getting a certain answer.
I can do this with AutoModelForCausalInference, but most big models don’t fit my GPUs (and a HF Pro subscription is cheaper than another A100).
Is there any way to use the API this way?