Hello,
I am having the following two issues.
- I cannot run large models using the inference API. For example, if I run the following,
import requests
API_URL = "https://api-inference.huggingface.co/models/EleutherAI/gpt-neox-20b"
headers = {"Authorization": "Bearer <MY_API_KEY_HERE>"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Can you please let us know more details about your ",
})
print(output)
I get this error
{'error': 'Model EleutherAI/gpt-neox-20b is currently loading', 'estimated_time': 1651.7474365234375}
Why does this happen, and is there a way around the issue?
- Even for the smaller models I do manage to run successfully, the output is different from the one generated in the user interface, for example the code below
import requests
API_URL = "https://api-inference.huggingface.co/models/EleutherAI/gpt-neo-2.7B"
headers = {"Authorization": "Bearer <MY_API_KEY_HERE>"}
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
output = query({
"inputs": "Can you please let us know more details about your ",
})
print(output)
generated the following
[{'generated_text': 'Can you please let us know more details about your \nschedule.\n\nThanks, \nLiz Taylor \n\n-----Original Message----- \nFrom: Dasovich, Jeff [mailto:Jeff.D'}]
but on the website it is different (see below)
Why is this the case? Is there a way to ensure the outputs using the free Inference API are more aligned with those of the web UI?
Thanks