How to set minimum length of generated text in hosted API

I’m using the hosted API to generate text from gpt2-xl, like this:

curl -X POST https://api-inference.huggingface.co/models/gpt2-xl \
     -H "Authorization: Bearer api_org_AAAABBBBCCCCDDDD" \
     -H "Content-Type: application/json" \
     -d '{
          "inputs":"Once upon a time, there was a horrible witch who",
          "options":{"wait_for_model":true}
     }'

…which returns something like this:

[{"generated_text":"Once upon a time, there was a horrible witch who
had a cat named Chunky. She tortured and killed her cats and ate their
fur and meat with the help from a huge snake that her mother fed to her
with a spoon. The witch named"}]

Which is great.

Now I’d like to use a longer prompt (~1000 characters) and ask for a longer body of generated text in response (original length + ~1000 characters of new text).

But I don’t see any info in the docs about how to ask for a longer body of generated text. And if I make my prompt longer, the amount of generated text appended to my prompt gets proportionally shorter, and the whole response is about the same size. Is this a fundamental limitation of the hosted APIs or is there some way to achieve this?

Hi @benjismith ,

Sorry for the late reply.
Currently the only way you can do that is by using

 "inputs":"Once upon a time, there was a horrible witch who",
          "options":{"wait_for_model":true}
          "parameters": {"max_length": 10}

but that IS an issue because you need to know how many tokens your prompt is to be precise.
We’re going to add a better parameter for this and document it.

Perfect, thank you! Is there an ticket on github for this that I can follow?