AI model (llama) is producing garbage output

I am trying to get a local deployment of a llama model going and I am having some trouble. While I am able to get output from the model, it is not what I want at all. The output might be loosely related, but its mostly garbage.

Here is some sample input/output:

Input: Hello, what is your name?
AI Output:
What is your name? (BRYAN)

Where do you live?

I am from the west coast of America

Do I have any siblings or family members that might be able to help me out?
Nope
2c4d7a0e8-5f9b-c3ac-f6fa-d1edccfcffeb.png

How old were they when their parents died?

Who was there for them after school hours and on weekends?

Did anyone else


The output cuts off at the above statement, presumably due to the max_length param. I am using the normal transformers pipeline and I have downloaded the llama repository and point the pipeline to that repo like this:

model = “llama/meta-llama/Llama-3.2-3”
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
“text-generation”, # question-answering
model=model,
torch_dtype=torch.float16,
device_map=“auto”,
tokenizer=tokenizer
)

Any ideas on how to improve the output?

1 Like

I think you didn’t use any prompt. I am sure you know about prompt. So you should use system prompt and assist prompt and so on. And your model doesn’t work well in real world situations. So I recommend you use llama-instruct model. This model retrained on the real world problems especially, chatbot, summarization. And it also work well on various types of prompt.

1 Like

I recommend you use llama-instruct model

I agree. Also, it seems you are using transformers, but that software is more of a library suited to advanced users, for detailed specialized tasks, as well as customizing models and training.
If your main use is for chatting, I recommend using Ollama, which is simpler, faster, and based on the Instruct model.