How to return more tokens when calling the inference end point?

monsteraenjoyer · May 9, 2024, 7:31pm

For people finding this thread at a later time. This is an example of how a request (for Llama 3 8B) might look like:

{
    "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>How are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>",
    "parameters": {
        "max_new_tokens": 1000
    }
}

Topic		Replies	Views
How to increase max_new_tokens beyond 1200 in code llama Models	2	812	September 25, 2024
How to increase tokens text generation API Intermediate	1	754	August 28, 2022
Change length of GPT-neo output Beginners	6	1882	June 10, 2021
I am unable to adjust the generated text length Beginners	8	498	September 26, 2024
How to stop LLM from going up to the max token limit? Intermediate	1	135	September 25, 2024

How to return more tokens when calling the inference end point?

Related topics