Change input_ids via API Inference for Text Generation

Hi! I’ve been messing around with a script that summarizes a text gotten from an audio transcription, so the text length is pretty variable, when it gets long enough (not much really) I get this error from the query (returns a response.json()):

{‘error’: ‘Input length of input_ids is 99, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.’, ‘warnings’: [‘There was an inference error: Input length of input_ids is 99, but max_length is set to 20. This can lead to unexpected behavior. You should consider increasing max_length or, better yet, setting max_new_tokens.’]}

I haven’t been able to find documentation about how to change this parameter for an inference deployment, how can I change the value?

Here’s the code snnipet:

def query(payload, API_URL, headers):
    response =, headers=headers, json=payload)
    return response.json()

@st.cache_resource(show_spinner = False)
def summarize_(text, prompt, model, key):

    if len(text) > limit_of_tokens:
        limit_index = text.find(".", limit_of_tokens)
        text = text[limit_index + 1:] 
        #get the last biggest possible fragment, normally the last parts of the conversation
        #are the most imporant
    full_query = prompt + text

    API_URL_summarize = f"{model}"
    headers = {"Authorization": f"Bearer {token}"}

    data = query({
	"inputs": full_query,

    "parameters" : {
    "return_full_text" : False},

    "options" : {
    "wait_for_model" : True}
    API_URL_summarize, headers)
    result = data[0]['generated_text']
    return result