Problem for large context window (400k)

Trillia · April 15, 2024, 2:03pm

Hi,

I’d like to use the model namespace-Pt/activation-beacon-llama2-7b-chat. It’s a model with a 400k context window, which is what I want.
But when I try to launch a dedicated Inference Endpoints, when I use it, it says {‘error’: 'Input validation error: inputs tokens + max_new_tokens must be <= 2048. […] }.
I already tried to modify Max Input Length (per Query) and Max Number of Tokens (per Query), but I have the same problem.

So how can I use this large context window model please ?

Thank you !

Rayisy · May 2, 2024, 11:38am

I need help!!!

storz · June 27, 2024, 2:59pm

No help, but just to let you know: I have a similar problem with the gradient llama3 models, I need a big context window, but the inference endpoint somehow does not play along nicely. Asked support, no answer yet.

GooglyEyeSuperman · July 2, 2024, 5:28pm

Hi also want to know!

alsterg · July 24, 2024, 8:20am

Same here. I’m chasing my tail trying to deploy a large context window model on an inference endpoint.
Some general guidance on how to do it would be nice.

Topic		Replies	Views
Trying to setup Long-Context LLM endpoint Beginners	2	244	August 17, 2024
Llama 2 deployed with different content lengths? Inference Endpoints on the Hub	1	652	August 31, 2023
Code Llama Instruct 34B accepts only 4096 tokens on PRO Site Feedback	0	621	January 11, 2024
How to increase max_new_tokens beyond 1200 in code llama Models	2	834	September 25, 2024
Issue with max_length 🤗Transformers	1	2480	September 27, 2020

Problem for large context window (400k)

Related topics