Models slow on M1 Pro 16gb

Hi All, I am enjoying the transformers library but unfortunately it is very slow. It takes 15-30 minutes for a small request with llama7b. In comparison, this takes a few seconds when I run it with ollama.ai. My setup is just using the pipeline method, has anyone else seen issues? My setup is:

conda create --name pytorch39 python=3.9
conda activate pytorch39
conda install -c huggingface transformers 
conda install pytorch-nightly::pytorch torchvision torchaudio -c pytorch-nightly
conda install -c anaconda pillow libtiff
conda install -c conda-forge accelerate einops
huggingface-cli login 
from transformers import pipeline
from transformers import AutoTokenizer
import transformers
import torch
transformers.logging.set_verbosity_debug()

model = "meta-llama/Llama-2-7b-chat-hf" # meta-llama/Llama-2-7b-hf
llama_pipeline = pipeline(
    "text-generation",
    model=model,
    torch_dtype=torch.float16,
    device_map="auto",
)

tokenizer = AutoTokenizer.from_pretrained(model, use_auth_token=True)

prompt = 'Can you explain why grass is green?'
sequences = llama_pipeline(
    prompt,
    top_k=10,
    num_return_sequences=1,
    eos_token_id=tokenizer.eos_token_id,
    max_length=256,
)
print("Chatbot:", sequences[0]['generated_text'])