Llama 2 deployed with different content lengths?

Hi there,

I deployed a aws-llama-2-7b-chat-hf model and tried to query it, but I get the following error:

{‘error’: ‘Input validation error: inputs tokens + max_new_tokens must be <= 1512. Given: 3294 inputs tokens and 20 max_new_tokens’, ‘error_type’: ‘validation’}

But the Content Length for all llama 2 models is supposed to be 4k. What gives? Can anyone shed some light?

You can confgure those values when creating a new endpoint in the advanced section. 1512 is the default

1 Like