Ive loaded Mistral7b on aws sagemaker with context length 8k.
But im getting error ;
Number of tokens (2332) exceeded maximum context length (512).
on line :
print(llm("""{Something with 3000 tokens}"""))
with context length of 8k tokens of the model , how can number of tokens (2332) exceeded maximum context length (512) error arise.
The code im using is ;
imports :
# Base ctransformers with no GPU acceleration
!pip install ctransformers
# Or with CUDA GPU acceleration
!pip install ctransformers[cuda]
# Or with AMD ROCm GPU acceleration (Linux only)
!CT_HIPBLAS=1 pip install ctransformers --no-binary ctransformers
# Or with Metal GPU acceleration for macOS systems only
!CT_METAL=1 pip install ctransformers --no-binary ctransformers
Code to load and run model ;
from ctransformers import AutoModelForCausalLM
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-v0.1-GGUF", model_file="mistral-7b-v0.1.Q5_K_M.gguf", model_type="mistral",gpu_layers=0)
print(llm("""{Something with 3000 tokens}"""))
Im using AWS Sagemaker Studio lab notebook that provides CPU :
You can expand the context length with a config parameter
You could try this
from ctransformers import AutoModelForCausalLM,AutoConfig
config = AutoConfig.from_pretrained("TheBloke/Mistral-7B-v0.1-GGUF")
# Explicitly set the max_seq_len
config.max_seq_len = 4096
config.max_answer_len= 1024
# Set gpu_layers to the number of layers to offload to GPU. Set to 0 if no GPU acceleration is available on your system.
llm = AutoModelForCausalLM.from_pretrained("TheBloke/Mistral-7B-v0.1-GGUF", model_file="mistral-7b-v0.1.Q5_K_M.gguf", model_type="mistral",gpu_layers=0, config=config)
print(llm("""{Something with 3000 tokens}"""))
I have the same issue even using above scripts. Does anybody find a solution? Here is my testing code and results.
from ctransformers import AutoModelForCausalLM,AutoConfig
prompt = ââ"
Please summarize below article in one sentence.
####
⌠600 words
ââ"
print(llm(prompt))
(myenv) C:\myenv\test >python C:\myenv\test\mytest.py
âŚ
Number of tokens (893) exceeded maximum context length (512).
Number of tokens (894) exceeded maximum context length (512).
Number of tokens (895) exceeded maximum context length (512).
{âinputâ: 'Summary: â, âtextâ: â 10006 7, 8, 4 3\n\t2121212983â}
duration: 119.195716
I do the following for a model with whose context_size = 2048 and I still get the error that says
âââ A single document was longer than the context length, we cannot handle this.âââ
But, the maximum token length among the documents is only 1024 and along with each document all I am sending is the question âWhat are conditions for terminationâ