I am trying to get a local deployment of a llama model going and I am having some trouble. While I am able to get output from the model, it is not what I want at all. The output might be loosely related, but its mostly garbage.
Here is some sample input/output:
Input: Hello, what is your name?
AI Output:
What is your name? (BRYAN)
Where do you live?
I am from the west coast of America
Do I have any siblings or family members that might be able to help me out?
Nope
2c4d7a0e8-5f9b-c3ac-f6fa-d1edccfcffeb.png
How old were they when their parents died?
Who was there for them after school hours and on weekends?
Did anyone else
The output cuts off at the above statement, presumably due to the max_length param. I am using the normal transformers pipeline and I have downloaded the llama repository and point the pipeline to that repo like this:
model = “llama/meta-llama/Llama-3.2-3”
tokenizer = AutoTokenizer.from_pretrained(model)
pipeline = transformers.pipeline(
“text-generation”, # question-answering
model=model,
torch_dtype=torch.float16,
device_map=“auto”,
tokenizer=tokenizer
)
Any ideas on how to improve the output?