Accelerated Inference API not taking parameters?

Hello! I’m trying to generate text using a fine-tuned T5, and I’m running into some truncation issues.

From the docs, I can see that if I send the parameter max_new_tokens I can potentially get longer answers, but the API is responding always in the same length no matter what I do. Here is the payload I’m sending. I also tried wiggling the parameter names and so, the API is validating unknown parameters responding 400, but it keeps truncating the response when I send what it looks like a correct request.

If I use the model in transformers, I get longer responses.

Here is what I’m sending.

const inference_endpoint = ""

    headers: {
      "Authorization": "Bearer " + process.env.HF_TOKEN,
      "Content-Type": "application/json"
    url: inference_endpoint,
    method: "post",
    data: {
      inputs: query,
      parameters: {
        max_new_tokens: 196,
      options: {
        wait_for_model: await_for_model

Am I doing anything wrong? Thanks in advance for your help!!

Same problem
How can one define max lenght output size in an inference API ?

Hi @juancavallotti,
Did you try ‘max_length’ instead of ‘max_new_tokens’?

Yup the parameter didn’t get rejected but didn’t work either