How to return more tokens when calling the inference end point?

For people finding this thread at a later time. This is an example of how a request (for Llama 3 8B) might look like:

{
    "inputs": "<|begin_of_text|><|start_header_id|>user<|end_header_id|>How are you?<|eot_id|><|start_header_id|>assistant<|end_header_id|>",
    "parameters": {
        "max_new_tokens": 1000
    }
}