Output getting stuck while running a GGUF model using llama.cpp and llama index

catastropiyush · July 27, 2024, 7:57am

I loaded a GGUF model using llama.cpp and using Llamaindex to query it.
But when I run print response.

input = """ Describe all the parameters of the material discussed in the text.
  """
response = query_engine.query(input)
print(response)

It just shows this



/usr/local/lib/python3.10/dist-packages/llama_cpp/llama.py:1129: RuntimeWarning: Detected duplicate leading "<s>" in prompt, this will likely reduce response quality, consider removing it...
  warnings.warn(

What could be the issue?

Topic		Replies	Views
Why I am getting this problem while running any of the GGUF model instead of with .bin model Models	1	2631	November 7, 2023
Lama 3.23b performs great when I download and use using ollama but when I manually download the model or if I use the gguf model by unsloth, it gives me irrelevant response. Please help me out Beginners	9	1411	October 31, 2024
CUDA convert GUFF to CUDA GUFF Models	6	203	December 18, 2024
Error when training Llama-3.2-3B-Instruct-Q5_K_L.gguf with Lora Beginners	1	27	August 13, 2025
Failed to create LLM 'llama' from .GGUF Beginners	0	315	December 25, 2024

Output getting stuck while running a GGUF model using llama.cpp and llama index

Related topics