Problem for large context window (400k)

Hi,

I’d like to use the model namespace-Pt/activation-beacon-llama2-7b-chat. It’s a model with a 400k context window, which is what I want.
But when I try to launch a dedicated Inference Endpoints, when I use it, it says {‘error’: 'Input validation error: inputs tokens + max_new_tokens must be <= 2048. […] }.
I already tried to modify Max Input Length (per Query) and Max Number of Tokens (per Query), but I have the same problem.

So how can I use this large context window model please ?

Thank you !

I need help!!!