What is the difference between llama2_7B and llama2_7B_hf?

We implemented both the llama2_7b and llama2_7b_hf models on a local network to evaluate the performance of the models. Even though the same questions were asked, different results were obtained, and in my case, llama2_7b had higher answer satisfaction.

The implementation code is as follows.
from transformers import AutoTokenizer, AutoModelForCausalLM

tokenizer = AutoTokenizer.from_pretrained(“meta-llama/Llama-2-7b-chat-hf”)
tokenizer.save_pretrained(‘Llama2-7b-tokenizer’)

model = AutoModelForCausalLM.from_pretrained(“meta-llama/Llama-2-7b-chat-hf”)
model.save_pretrained(‘Llama2-7b-model’)

When implementing llama2_7b, we used python convert code.
The code is: python convert_llama_weights_to_hf.py

Here is the answer to the same question:
prompt = “What is the capital of South Korea?”
and there was some difference.

In hf model, it answered some strange answer
but in 7b model, it answered correct answer.

I thought these two models are same, but Why do I get different answers to the same question?
Is there anything that I missed?