How to return more tokens when calling the inference end point?

Hi,

I purchased the pro account and spun up a WizardLM/WizardLM-13B-V1.1 end point for text generation. The API only seems to return a max of 75 characters. I’ve tried adding max_tokens and min_tokens and return_full_text, it doesn’t seem to have an effect.

How can I increase the amount of tokens returned?

Here’s an example of what I’ve tried:

curl https://abcd.us-east-1.aws.endpoints.huggingface.cloud \
-X POST \
-d '{"inputs":"I want to build a large parking structure.  Can you tell me the steps I would take?", "options": {"min_tokens": 5000, "max_tokens": 5000, "return_full_text": true}}' \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json"

Hell @maplesake please check the documentation and the blog post.

@maplesake Hello. I have the same problem. Have you found a solution?
I don’t understand why this is not clearly written in the documentation, there is nothing at all on this subject.

I agree documentation can be much better. max_new_tokens parameter is what you are looking for.