Hi,
I just downloaded the LLama2 model from the Meta repository (specifically llama.cpp on Mac). Now I want to use it in a Python script. The quntized model file (ggml-model-q4_0.bin) s stored now on my Mac. How can I now use the LLama tokenizer and load the model? I am used to acquire the model directely via Huggingface and not locally on my laptop.
I downloaded the Llama-7B-chat model and then used the Llama.cpp
repository to quantize it. Therefore I have 2 folders LLama and Llama.cpp where the actual model is stored (see Screenshot).
Thanks in advance!