I finetuned the Llama2-7b model on sagemaker, but I get the token limit reached error during inference.
Here is the error that I get most of the times.
An error occurred (ModelError) when calling the InvokeEndpoint operation: Received client error (422) from primary with message "{"error":"Input validation error: `max_new_tokens` must be <= 1. Given: 500","error_type":"validation"}"
Does anyone have any idea why this is and how I can solve this?
@philschmid , maybe you can help?
You can configure this via then environment variables.
MAX_INPUT_LENGTH': json.dumps(1024), # Max length of input text
'MAX_TOTAL_TOKENS': json.dumps(2048), # Max length of the generation (including input text)
Sorry for the late reply. This worked thanks.
For a longer input with system prompts, can we increase the input length or the token length?
Where should these be configured exactly? I ran into the same error while trying to invoke deployed sagemaker endpoint. I passed these along when deploying the finetuned estimator to the endpoint but still getting the same errors.