Good morning, I paid for a Community Pro subscription to get faster inference time with BLOOM. However, the “use_gpu” parameter is ignored when calling the API. The “num_return_sequences” and some other parameters seem to be ignored as well. This is a bit problematic as I can’t generate different outputs or a given input no matter what.
Following is a snippet from the Python code I’m running:
TOKEN = "Bearer ###my token###"
headers = {"Authorization": TOKEN}
API_URL = "https://api-inference.huggingface.co/models/bigscience/bloom"
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
payload = {
"inputs": prompt,
"parameters": {
"max_new_tokens": 40,
"temperature" : 1.0,
"do_sample": True,
"return_full_text": False, #does not work
"num_return_sequences": 5, #does not work
"repetition_penalty":100.0},
"options" : {
"use_gpu": True, #does not work
"use_cache": True,
"wait_for_model": True}
}
output_text = query(payload)