Detailed parameters not working in BLOOM-176B

Good morning, I paid for a Community Pro subscription to get faster inference time with BLOOM. However, the “use_gpu” parameter is ignored when calling the API. The “num_return_sequences” and some other parameters seem to be ignored as well. This is a bit problematic as I can’t generate different outputs or a given input no matter what.

Following is a snippet from the Python code I’m running:

TOKEN = "Bearer ###my token###"
headers = {"Authorization": TOKEN}
API_URL = ""

def query(payload):
    response =, headers=headers, json=payload)
    return response.json()
payload = {
    "inputs": prompt,
    "parameters": {
        "max_new_tokens": 40,           
        "temperature" : 1.0,
        "do_sample": True,
        "return_full_text": False, #does not work
        "num_return_sequences": 5, #does not work
    "options" : {
        "use_gpu": True, #does not work
        "use_cache": True,
        "wait_for_model": True}

output_text = query(payload)
1 Like

UP. Is anyone from HF able to help?