I am using codellama on inference end point, but i am unable to give max_new_tokens beyond 1100 in the parameters. it throws me this error:
"parameters": {
"max_new_tokens": 1024, # adjust this value to generate more tokens
"return_full_text": False,
}
{‘error’: ‘Input validation error: inputs
tokens + max_new_tokens
must be <= 1512. Given: 345 inputs
tokens and 1324 max_new_tokens
’, ‘error_type’: ‘validation’}
Is there any way around this?
I thought I would have to tinker with the Endpoint side, but it looks like there is a way to do it.
Appreciate your response, but it is infact a container config of the end point. I updated it and am able to generate response up to 10000 tokens
1 Like