Hello everyone, I am trying to use Llama-2 (7b) from Hugging face. With below code I was able to load the model successfully but when I am trying to generate the output its taking forever.
from transformers import AutoTokenizer, AutoModelForCausalLM tokenizer = AutoTokenizer.from_pretrained("Llama-2-7b-hf") model = AutoModelForCausalLM.from_pretrained("Llama-2-7b-hf") input_ids = tokenizer.encode("What is LLM?", return_tensors="pt") output = model.generate( input_ids, temperature=0, max_new_tokens=100 ) generated_text = tokenizer.decode(output) print(generated_text)
Model files downloaded from Llama-2-7b-hf
Hardware: Macbook Pro (M2 Pro) 16 GB RAM