I’m using the hosted API to generate text from gpt2-xl, like this:
curl -X POST https://api-inference.huggingface.co/models/gpt2-xl \
-H "Authorization: Bearer api_org_AAAABBBBCCCCDDDD" \
-H "Content-Type: application/json" \
-d '{
"inputs":"Once upon a time, there was a horrible witch who",
"options":{"wait_for_model":true}
}'
…which returns something like this:
[{"generated_text":"Once upon a time, there was a horrible witch who
had a cat named Chunky. She tortured and killed her cats and ate their
fur and meat with the help from a huge snake that her mother fed to her
with a spoon. The witch named"}]
Which is great.
Now I’d like to use a longer prompt (~1000 characters) and ask for a longer body of generated text in response (original length + ~1000 characters of new text).
But I don’t see any info in the docs about how to ask for a longer body of generated text. And if I make my prompt longer, the amount of generated text appended to my prompt gets proportionally shorter, and the whole response is about the same size. Is this a fundamental limitation of the hosted APIs or is there some way to achieve this?