How to return more tokens when calling the inference end point?

maplesake · July 27, 2023, 3:03am

Hi,

I purchased the pro account and spun up a WizardLM/WizardLM-13B-V1.1 end point for text generation. The API only seems to return a max of 75 characters. I’ve tried adding max_tokens and min_tokens and return_full_text, it doesn’t seem to have an effect.

How can I increase the amount of tokens returned?

Here’s an example of what I’ve tried:

curl https://abcd.us-east-1.aws.endpoints.huggingface.cloud \
-X POST \
-d '{"inputs":"I want to build a large parking structure.  Can you tell me the steps I would take?", "options": {"min_tokens": 5000, "max_tokens": 5000, "return_full_text": true}}' \
-H "Authorization: Bearer $HF_TOKEN" \
-H "Content-Type: application/json"

philschmid · July 27, 2023, 6:08am

Hell @maplesake please check the documentation and the blog post.

roland71 · March 29, 2024, 10:15am

@maplesake Hello. I have the same problem. Have you found a solution?
I don’t understand why this is not clearly written in the documentation, there is nothing at all on this subject.

Roelf · April 21, 2024, 12:48pm

I agree documentation can be much better. max_new_tokens parameter is what you are looking for.

monsteraenjoyer · May 9, 2024, 7:31pm

For people finding this thread at a later time. This is an example of how a request (for Llama 3 8B) might look like:

{
    "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>How are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>",
    "parameters": {
        "max_new_tokens": 1000
    }
}

Topic		Replies	Views
Change length of GPT-neo output Beginners	6	1891	June 10, 2021
How to increase tokens text generation API Intermediate	1	758	August 28, 2022
How does the GPT-J inference API work? Beginners	5	764	October 8, 2021
BLOOM outputs only few tokens Inference Endpoints on the Hub	1	901	December 6, 2022
How to increase max_new_tokens beyond 1200 in code llama Models	2	844	September 25, 2024

How to return more tokens when calling the inference end point?

Related topics