Hi! I’ve been messing around with a script that summarizes a text gotten from an audio transcription, so the text length is pretty variable, when it gets long enough (not much really) I get this error from the query (returns a response.json()):
{‘error’: ‘Input length of input_ids is 99, but
max_length
is set to 20. This can lead to unexpected behavior. You should consider increasingmax_length
or, better yet, settingmax_new_tokens
.’, ‘warnings’: [‘There was an inference error: Input length of input_ids is 99, butmax_length
is set to 20. This can lead to unexpected behavior. You should consider increasingmax_length
or, better yet, settingmax_new_tokens
.’]}
I haven’t been able to find documentation about how to change this parameter for an inference deployment, how can I change the value?
Here’s the code snnipet:
def query(payload, API_URL, headers):
response = requests.post(API_URL, headers=headers, json=payload)
return response.json()
@st.cache_resource(show_spinner = False)
def summarize_(text, prompt, model, key):
if len(text) > limit_of_tokens:
limit_index = text.find(".", limit_of_tokens)
text = text[limit_index + 1:]
#get the last biggest possible fragment, normally the last parts of the conversation
#are the most imporant
full_query = prompt + text
API_URL_summarize = f"https://api-inference.huggingface.co/models/{model}"
headers = {"Authorization": f"Bearer {token}"}
data = query({
"inputs": full_query,
"parameters" : {
"return_full_text" : False},
"options" : {
"wait_for_model" : True}
},
API_URL_summarize, headers)
result = data[0]['generated_text']
return result