Hey!
Ultimately I want a reasonably modern model to do cloze completion tasks. So, give it a sentence like:
“Alice and Bob went to the market to bu_” and ask it what goes in the blank (here, the highest-probability answer is “Y”).
I’d like to know the probability (or logit) of all the tokens it’s considering. I was doing this with OpenAI, but they made an API change that makes it very expensive.
I’m using a 72B model ( liberated-qwen1-5-72b-kbd) on a Huggingface hosted endpoint. Before I was using system messages explaining what I wanted (with OpenAI’s chat interface), but I suppose that’s not required.
I’m doing this:
import requests
API_URL = “https://somestuff.us-east-1.aws.endpoints.huggingface.cloud”
headers = {
“Accept” : “application/json”,
“Authorization”: “Bearer hf_something”,
“Content-Type”: “application/json”
}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
“inputs”: “Can you please let us know more details about yo”,
“parameters”: {
“temperature”: 1,
“max_new_tokens”: 10,
“do_sample”: False,
“return_tensors”: True,
# “prefix”: “somestuff”
}
})
and I get back this: [{‘generated_text’: ‘Can you please let us know more details about yoour issue? What is the issue you are facing’}]
It’s just a list with one dict inside. I don’t seem to be getting tensors back.
What am I doing wrong?