Hi,
I’d like to use the model namespace-Pt/activation-beacon-llama2-7b-chat. It’s a model with a 400k context window, which is what I want.
But when I try to launch a dedicated Inference Endpoints, when I use it, it says {‘error’: 'Input validation error: inputs
tokens + max_new_tokens
must be <= 2048. […] }.
I already tried to modify Max Input Length (per Query) and Max Number of Tokens (per Query), but I have the same problem.
So how can I use this large context window model please ?
Thank you !