BLOOM parameter '"return_full_text": False' isn't being respected, and the "use_gpu" option doesn't appear to be working

ahomosapiens · August 21, 2022, 11:51am

Hello, apologies for the newbie question.

I am using the following code to obtain outputs from BLOOM. I have set the “return_full_text” option to False, however, I always get the full input text back along with the predicted completions in the generated output string, whether this is set to True or False. The “max_new_tokens” option is working, I can confirm, because the length of the output string varies appropriately when I change this. However the other parameter I have in there, “return_full_text”: False, doesn’t impact the output.

Do I have the syntax wrong? The code was drawn from this post

import requests

API_URL = "https://api-inference.huggingface.co/models/bigscience/bloom"
headers = {"Authorization": "Bearer <TOKEN>"}

def query(payload):
        response = requests.post(API_URL, headers=headers, json=payload)
        return response.json()

while True:
    text_input = input("Insert your input: ")
    output = query({
            "inputs": text_input,
            "parameters": {"max_new_tokens": 64,
                           "return_full_text": False},
            "options": {"use_gpu": True, "use_cache": False}
    })

    print(output)

Also the “use_gpu”: True parameter doesn’t appear to work. I am on the paid $9 a month plan, so it should be available to me. However the results don’t come back any faster with this option. I’ve timed it with some sample queries that take about 14~15 seconds to come back, and I get more or less the same response time whether use_gpu is set to True or False. Not sure what is happening, possibilities include: (a) I have the syntax wrong, (b) it doesn’t make much of a difference anyway (doubt it), (c) system thinks i’m not allowed to use the GPU, and other possibilities.

To troubleshoot i would be nice if I could get back a debug-level response but it appears the response is limited to the generated text. Is there any way to induce the system to return more info such as what it interprets the input data as having been, what all the parameters were set to when it generated the response, and so on?

Again sorry for newbie questions, thanks for any help.

rwheel · August 22, 2022, 9:28am

Hi @ahomosapiens,

I posted a similar question related to the stopping criteria parameter (Stopping criteria BLOOM), but I haven’t received an answer yet… I would say that it is a problem with BLOOM because other models with the same parameters work correctly… I haven’t tried the use_gpu parameter, but I think that your code is correct.

I created a new discussion in the BLOOM community with regards to the stopping criteria parameter. If I get any response, I’ll let you know.

Cheers!

ahomosapiens · August 22, 2022, 9:40am

Great thanks for the note!

The BLOOM model is new of course so a few quirks are to be expected… I’m so happy this has come out! Exciting times and if you have any followup thoughts I’d love to hear from you.

laurinpaech · January 23, 2023, 8:15pm

Any update on this? I have the same problem.

Topic		Replies	Views
Detailed parameters not working in BLOOM-176B 🤗Accelerate	1	695	October 7, 2022
Using Bloom with detailed parameters? Models	8	2919	February 18, 2023
Accelerated Inference API not taking parameters? Intermediate	5	1634	October 26, 2022
Text generation returning multiple languages Beginners	2	558	January 25, 2023
What parameter settings (if any) do the "Sample" and "Greedy" options correspond to when using the BLOOM api? Models	2	898	December 18, 2022

BLOOM parameter '"return_full_text": False' isn't being respected, and the "use_gpu" option doesn't appear to be working

Related topics