Problem for large context window (400k)

Hi,

I’d like to use the model namespace-Pt/activation-beacon-llama2-7b-chat. It’s a model with a 400k context window, which is what I want.
But when I try to launch a dedicated Inference Endpoints, when I use it, it says {‘error’: 'Input validation error: inputs tokens + max_new_tokens must be <= 2048. […] }.
I already tried to modify Max Input Length (per Query) and Max Number of Tokens (per Query), but I have the same problem.

So how can I use this large context window model please ?

Thank you !

I need help!!!

No help, but just to let you know: I have a similar problem with the gradient llama3 models, I need a big context window, but the inference endpoint somehow does not play along nicely. Asked support, no answer yet.

Hi also want to know!

Same here. I’m chasing my tail trying to deploy a large context window model on an inference endpoint.
Some general guidance on how to do it would be nice.