Hi there,
I deployed a aws-llama-2-7b-chat-hf model and tried to query it, but I get the following error:
{‘error’: ‘Input validation error: inputs
tokens + max_new_tokens
must be <= 1512. Given: 3294 inputs
tokens and 20 max_new_tokens
’, ‘error_type’: ‘validation’}
But the Content Length for all llama 2 models is supposed to be 4k. What gives? Can anyone shed some light?