meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API

John6666 · February 2, 2025, 6:07am

or that affects only speed

Basically, this should be the case, and there are few cases where you get half-baked results due to insufficient hardware performance. It’s either it works or it doesn’t, and it’s either fast or slow.

I found the official HF implementation for Llama2. It may be that tokenizer.use_default_system_prompt = False is meaningful.

Since Llama2 has been around for a long time, it has been affected by various HF specification changes, so there is likely to be some confusion about how to use it.

Topic		Replies	Views
Llama2 response times - feedback Beginners	0	620	February 6, 2024
What is the difference between llama2_7B and llama2_7B_hf? Models	0	262	May 2, 2024
Model Fine Tuning using Llama-2-7b-chat-hf not working for text-to-SQL task Beginners	0	301	June 14, 2024
meta-llama/Llama-2-7b-chat-hf not generate response when prompt is long Beginners	4	4106	September 27, 2023
meta-llama/Llama-2-7b-chat-hf not generate response 🤗Transformers	0	648	September 18, 2023

meta-llama/Llama-2-7b-chat-hf weird responses, compared to the ones returned by the HF API

Related topics