Hello! I’m trying to generate text using a fine-tuned T5, and I’m running into some truncation issues.
From the docs, I can see that if I send the parameter max_new_tokens I can potentially get longer answers, but the API is responding always in the same length no matter what I do. Here is the payload I’m sending. I also tried wiggling the parameter names and so, the API is validating unknown parameters responding 400, but it keeps truncating the response when I send what it looks like a correct request.
If I use the model in transformers, I get longer responses.
Here is what I’m sending.
const inference_endpoint = "https://api-inference.huggingface.co/models/squidgy/t5-ml-finetuned-gec"
headers: {
"Authorization": "Bearer " + process.env.HF_TOKEN,
"Content-Type": "application/json"
},
url: inference_endpoint,
method: "post",
data: {
inputs: query,
parameters: {
max_new_tokens: 196,
},
options: {
wait_for_model: await_for_model
}
}
Am I doing anything wrong? Thanks in advance for your help!!