Accelerated Inference API not taking parameters?

juancavallotti · May 1, 2022, 3:04am

Hello! I’m trying to generate text using a fine-tuned T5, and I’m running into some truncation issues.

From the docs, I can see that if I send the parameter max_new_tokens I can potentially get longer answers, but the API is responding always in the same length no matter what I do. Here is the payload I’m sending. I also tried wiggling the parameter names and so, the API is validating unknown parameters responding 400, but it keeps truncating the response when I send what it looks like a correct request.

If I use the model in transformers, I get longer responses.

Here is what I’m sending.

const inference_endpoint = "https://api-inference.huggingface.co/models/squidgy/t5-ml-finetuned-gec"

    headers: {
      "Authorization": "Bearer " + process.env.HF_TOKEN,
      "Content-Type": "application/json"
    },
    url: inference_endpoint,
    method: "post",
    data: {
      inputs: query,
      parameters: {
        max_new_tokens: 196,
      },
      options: {
        wait_for_model: await_for_model
      }
    }

Am I doing anything wrong? Thanks in advance for your help!!

Noplease · September 22, 2022, 5:13am

Hi
Same problem
How can one define max lenght output size in an inference API ?
Thanks

kernelpanic · September 22, 2022, 8:16am

Hi @juancavallotti,
Did you try ‘max_length’ instead of ‘max_new_tokens’?

juancavallotti · September 22, 2022, 2:30pm

Yup the parameter didn’t get rejected but didn’t work either

SarcasticApe · October 18, 2022, 9:21am

any update/solution on this issue ? i’m encountering the same problem with the bloom model, none of the advanced parameters of the inference API seems to work.

uccollab · October 26, 2022, 10:25am

Same problem for me and I didn’t receive any reply in my thread. It seems like the Inference API for BLOOM is basically broken and only allows basic generation.

Personally, I’ve found the following parameters being ignored:

max_new_tokens
temperature
do_sample
use_gpu (but this is to be expected, afaik HF handles GPU on Inference API with a separate pricing)

do_sample is particularly frustrating as BLOOM generates the same output over and over for a given input. I managed to force a bit of variation by playing with top_k, but this is not very rigorous.

Topic		Replies	Views
BLOOM outputs only few tokens Inference Endpoints on the Hub	1	895	December 6, 2022
Output token lengths of smaller models 🤗Transformers	0	499	October 30, 2023
Change input_ids via API Inference for Text Generation Beginners	0	172	April 10, 2024
Different parameters between JSON inference and Inference API 🤗Hub	0	1388	March 9, 2022
I am unable to adjust the generated text length Beginners	8	493	September 26, 2024

Accelerated Inference API not taking parameters?

Related topics