Number of tokens (2331) exceeded maximum context length (512) error.Even when model supports 8k Context length

808Code · October 3, 2023, 6:11am

Ive loaded Mistral7b on aws sagemaker with context length 8k.

But im getting error ;

Number of tokens (2332) exceeded maximum context length (512).

on line :

print(llm("""{Something with 3000 tokens}"""))

with context length of 8k tokens of the model , how can number of tokens (2332) exceeded maximum context length (512) error arise.

The code im using is ;
imports :

# Base ctransformers with no GPU acceleration
!pip install ctransformers
# Or with CUDA GPU acceleration
!pip install ctransformers[cuda]
# Or with AMD ROCm GPU acceleration (Linux only)
!CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
# Or with Metal GPU acceleration for macOS systems only
!CT_METAL=1 pip install ctransformers --no-binary ctransformers

Code to load and run model ;

from ctransformers import AutoModelForCausalLM

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-v0.1-GGUF", model_file="mistral-7b-v0.1.Q5_K_M.gguf", model_type="mistral",gpu_layers=0)
print(llm("""{Something with 3000 tokens}"""))

Im using AWS Sagemaker Studio lab notebook that provides CPU :

instance - t3.xlarge
vCPUs - 4
memory - 16GB

chibpi · October 12, 2023, 1:17pm

You can expand the context length with a config parameter

You could try this

from ctransformers import AutoModelForCausalLM,AutoConfig

config = AutoConfig.from_pretrained("TheBloke/Mistral-7B-v0.1-GGUF")
# Explicitly set the max_seq_len
config.max_seq_len = 4096
config.max_answer_len= 1024

# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-v0.1-GGUF", model_file="mistral-7b-v0.1.Q5_K_M.gguf", model_type="mistral",gpu_layers=0, config=config)
print(llm("""{Something with 3000 tokens}"""))

Mark2023x · October 21, 2023, 1:30am

I have the same issue even using above scripts. Does anybody find a solution? Here is my testing code and results.
from ctransformers import AutoModelForCausalLM,AutoConfig

model_name= “TheBloke/Mistral-7B-Instruct-v0.1-GGUF”
model_path = r’D:\Mistral\mistral-7b-instruct-v0.1.Q6_K.gguf’
config = AutoConfig.from_pretrained(model_name)
config.max_seq_len = 4096
config.max_answer_len= 1024
llm = AutoModelForCausalLM.from_pretrained(model_name, model_file=model_path, model_type=“mistral”,gpu_layers=0, config=config)

prompt = “”"
Please summarize below article in one sentence.
####
… 600 words
“”"
print(llm(prompt))

(myenv) C:\myenv\test >python C:\myenv\test\mytest.py
…
Number of tokens (893) exceeded maximum context length (512).
Number of tokens (894) exceeded maximum context length (512).
Number of tokens (895) exceeded maximum context length (512).
{‘input’: 'Summary: ‘, ‘text’: ’ 10006 7, 8, 4 3\n\t2121212983’}
duration: 119.195716

nav1420 · November 2, 2023, 4:59pm

Set config like below
config.config.max_new_tokens = 2048
config.config.context_length = 4096

It solved the issue for me.

Mark2023x · November 3, 2023, 3:55am

It does not work for me either. I got “Number of tokens (692) exceeded maximum context length (512)”.

Professor-Hunt · November 15, 2023, 8:48pm

This worked for me:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/zephyr-7B-beta-GGUF", 
                                           model_file="zephyr-7b-beta.Q5_K_M.gguf", 
                                           model_type="mistral", 
                                           gpu_layers=50,
                                           max_new_tokens = 1000,
                                           context_length = 6000)

No warnings output.

Rockey1 · December 25, 2023, 8:38am

llm = AutoModelForCausalLM.from_pretrained(“TheBloke/Llama-2-7B-GGUF”, model_file=“llama-2-7b.Q6_K.gguf”, model_type=“llama”,context_length=4096, max_new_tokens=4096, gpu_layers=0)

this worked for me

Rockey1 · December 25, 2023, 8:39am

Also, I think I read somewhere that ctransformers only support Llama and 2 other base model for context changes. Please correct me if I am wrong.

Robin19 · October 6, 2024, 5:07pm

I do the following for a model with whose context_size = 2048 and I still get the error that says

‘’’ A single document was longer than the context length, we cannot handle this.‘’’

But, the maximum token length among the documents is only 1024 and along with each document all I am sending is the question ‘What are conditions for termination’

from transformers import AutoTokenizer, AutoModelForCausalLM, AutoConfig

config = AutoConfig.from_pretrained('amd/AMD-Llama-135m')
config.context_length = 2048

tokenizer = AutoTokenizer.from_pretrained("amd/AMD-Llama-135m", truncation = True)
model = AutoModelForCausalLM.from_pretrained("amd/AMD-Llama-135m")

Note : config.config.context_length throws AtrributeError

@Rockey1 @Professor-Hunt

Topic		Replies	Views
Fine tuned Mistral 7B inference issue for >4k context length token with transformer 4.35+ 🤗Transformers	0	556	December 11, 2023
Scaling Mistral-7B on AWS SageMaker With Multiple Replica Endpoints Intermediate	0	620	January 19, 2024
Setting up Mistral on Inferentia2 with higher number of tokens Beginners	0	38	September 25, 2024
ValidationError: Max token limit(>=1) reached for finetuned models Amazon SageMaker	3	725	December 28, 2023
LLM with 1048k hosted on sagemaker Amazon SageMaker	0	39	September 11, 2024

Number of tokens (2331) exceeded maximum context length (512) error.Even when model supports 8k Context length

Related topics