Number of tokens (2331) exceeded maximum context length (512) error.Even when model supports 8k Context length

This worked for me:

llm = AutoModelForCausalLM.from_pretrained("TheBloke/zephyr-7B-beta-GGUF", 
                                           model_file="zephyr-7b-beta.Q5_K_M.gguf", 
                                           model_type="mistral", 
                                           gpu_layers=50,
                                           max_new_tokens = 1000,
                                           context_length = 6000)

No warnings output.